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ABSTRACT 


A  general  theory  Is  developed  for  obtaining  linear  estimates,  of 
minimum  mean  square  error,  of  components  of  random  processes.  The  funda¬ 
mental  requirement  for  applying  the  theory  Is  that  the  process  In  question 
be  vector  Markov  In  the  wide  sense,  a  requirement  typically  satisfied  ex¬ 
actly  and  almost  Invariably  satisfied  approximately.  This  estimation 
theory  has  considerable  advantages  over  methods  of  functional  analysis 
mentioned  In  Section  II  In  that:  (a)  It  is  concerned  with  vector  observa¬ 
tion  functions  which  may  be  time-dependent  linear  combinations  of  com¬ 
ponents  of  the  process;  (b)  very  general  nonstationary  processes  are 
treated;  (c )  explicit  solutions  are  obtained  without  difficulty;  (d) 
linear  prediction  Is  an  Immediate  consequence  of  linear  estimation;  and 
(e)  estimates  are  obtained  sequentially.  In  contrast  to  the  typical  ap¬ 
proach  of  considering  the  observation  as  the  sum  of  a  quantity  of  inter¬ 
est  (sometimes  called  the  signal  or  a  linear  combination  of  unknown  parame¬ 
ters),  together  with  disturbances  (sometimes  called  noise  or  errors  of  ob¬ 
servation),  the  observation  Is  considered  a  linear  combination  of  components 
of  a  vector  process,  and  the  best  linear  estimate  of  a  complete  state  vec¬ 
tor  of  the  sample  process  is  obtained. 

(The  essentials  of  the  theory  were  presented  In  an  Internal  memoran¬ 
dum,  CLM-30,  1960,  and  a  more  complete  exposition,  with  examples,  In  CLM- 
46  and  CLM-46A,  1961.) 
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I.  INTRODUCTION 


The  basic  problem  considered  is  the  sequential  determination  of  oest 
linear  estimates  of  a  vector  x(t)  based  on  non -denumerable  vector  observa¬ 
tions  linear  in  x(t),  for  (x(t)}  vector  Markov  processes  in  the  wide  sense; 
best  linear  est imates  x(t)  are  by  definition  linear  estimates  minimizing 
the  variance  of  estimate,  E  [x(t)  -  x(t)]  x(tT]  . 

Wide  sense  vector  Markov  processes,  considered  in  Appendix  B,  are 
those  processes  (x(t)}  such  that,  for  any  time  sequence  t  s:  t.  £  •••  s  t  , 
the  best  linear  estimate  of  x(t^)  on  the  basis  of  x(t^),  •••  ,  is 

a  linear  function  of  not  depending  on  x(Tj),  ...  ,  x(t^_2).  This 

is  equivalent  to 


X(T^) 


-v,v. 


-1  *<%-!>  ^  ''v- 


with  the  transition  matrices  not  depending  on  the  sample  function 

x(t),  and  v^  orthogonal  to  all  preceding  v  and  x.  With  slight  restrictions, 
wide  sense  vector  Markov  processes  satisfy  the  relations  (T2  >  t^), 

xCTg)  -  KCTj.T j)x(Tj)  +  wCTg.Tj), 


dK<T2,Tj)/dT  -  BCfg)  K(T2,-rj) 

"1 


with  the  w(T2,t^)  corresponding  to  non-overlapping  Intervals  mutually 
orthogonal,  B(t)  and  <)(t)  integrable  square  matrices,  and  Q(t)  non-negative 
definite.  The  structure  of  {x(t)}  is  thus  characterized  by  B(‘r)  and  Q(t). 

We  consider  observations  y(')  -  A(t)x(T)  made  over  a  finite  or  non- 
denumerable  set  of  values  of  t,  and  obtain,  sequentially,  best  linear  esti¬ 
mates  ft(T)  in  extremely  general  cases  which  may  involve  time-varying  A('), 
B(t),  and  Q(^).  Best  linear  prediction  is  a  simple  consequence  of  best 
linear  estimation.  The  assumption  that  (x(t)]  is  a  wide  sense  vector 
Markov  process  does  not  appear  to  be  very  restrictive. 


1 


Introduct Ion 


The  basis  of  the  analysis  is  the  fact  that  best  linear  estimates  and 
observations  on  which  they  are  based  are  orthogonal  to  resulting  errors  of 
estimate;  this  and  the  linear  structure  of  (x(t)}  permit  orthogonal Izat ion 
of  successive  observations  by  subtracting  from  each  observation  its  best 
linear  prediction. 

Section  III  treats  a  finite  set  of  observations;  Section  IV,  con¬ 
tinuous  processes  (xC^));  Section  V,  continuous  observations;  Section  VI, 
some  aspects  of  Gaussian  x(t). 

Standard  vector  notation  is  used,  with  E  the  symbol  for  expectation; 
A,  B,  C,  D,  F,  G,  ..  for  matrices;  a,  b,  c,  ...  for  column  vectors;  and  a, 
0,  Yi  •••  for  scalars.  The  transjpose  of  a  vector  or  matrix  is  denoted  by 
the  tilde.  I  and  1  represent  identity  matrices  and  column  vectors  of  unity 
components,  respectively.  The  symbol  A  stands  for  "is  equal  by  definition 
to."  Time  derivatives  are  always  considered  as  derivatives  to  the  right. 

To  avoid  undue  complexity  in  writing,  wo  use  the  short  notation: 

F'(t)  a  F<0 

^  ^  F(o,t) 

F(o,t)  a  F(o,t). 

That  is,  the  prime  denotes  the  derivative  of  the  function  with  respect  to 
the  first  time  variable,  the  dot  the  derivative  with  respect  to  the  second 
time  variable.  One  further  special  symbol  is  the  asterisk, representing 
the  operation; 

*F(t)  a  f'(t)  +  F(t)  B(t), 

o 

with  BCt)  a  specified  matrix  partially  characterizing  (x(t)}.  By  *  ,  we 
represent  the  repeated  operation 

F<t)  a  ♦  [*F(t)1. 

The  concept  of  orthogonality  of  two  random  vectors  is  used  exten¬ 
sively.  Vectors  x  and  y,  whether  or  not  of  equal  dimension,  are  said  to 
be  orthogonal  if,  and  only  if,  the  matrix  Exy  is  zero. 
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HISTORICAL  NOTE 


The  best  linear  absolutely  unbiased  estimate  of  a  vector  x  based  on  a 
finite  vector  y  of  observations  linear  In  x  has  been  well  known  for  many 
years.  The  standard  demonstration,  showing  that  the  estimate  minimizing  a 
quadratic  form  In  residuals  yields  the  minimum  variance  absolutely  unbiased 
estimate  of  every  linear  combination  of  x,  Is  called  the  Gauss-Markov  Theorem; 
this  corresponds  to  Conclusions  D  and  F  of  Theorem  0.  Theorem  1,  regarding 
the  best  linear  estimate  of  a  vector  x  based  on  a  vector  y  of  observations, 

Is  presumably  also  well  known,  but  the  author  has  not  found  It  In  the 
literature.  To  the  best  of  the  author's  knowledge,  the  first  work  on  se¬ 
quential  calculation  of  best  absolutely  unbiased  linear  estimates  was  that 
of  J.  W,  Follln,  Jr. ,  about  1955,  In  the  first  case  considered  In  Section 
III. 5. 

Kolmogorov,  1941,  obtained  the  best  linear  prediction  of  a  wide  sense 
stationary  scalar  process  observed  over  a  sequence  of  uniformly  spaced 
times  extending  Indefinitely  Into  the  past.  Wiener,  1949,  considered  scalar 
observations,  continuously  or  at  uniform  Intervals,  extending  Indefinitely 
Into  the  past,  on  the  sum  of  a  signal  process  and  a  noise  process,  each  with 
absolutely  continuous  spectral  distribution  function,  and  obtained  the  best 
linear  estimate  of  the  signal;  convenient  Instrumentation  gave  estimates 
sequentially.  Estimation  problems  Involving  multiple  observations  were 
treated  with  considerable  success,  but  gave  rise  to  the  difficult  problem 
of  properly  factoring  spectral  density  matrices.  The  reader  Is  referred  to 
Doob,  1953,  Chapter  XII  and  Supplement,  for  a  discussion  of  estimation 
based  on  scalar  observations  on  wide  sense  stationary  processes  observed 
indefinitely  into  the  past,  and  remarks  on  contributions  to  such  problems. 

Considerable  effort  has  been  devoted  to  more  realistic  and  general 
problems  related  to  those  solved  by  Wiener.  The  basic  problem  studied  Is 
the  linear  estimation  of  m(t)  *  '^(t),  given  observations  made  continuously 
over  an  Interval  [0,t^]  of 

ti(t)  -  m(t)  ♦  ?j(t)  ♦ 

V 

with  m(t)  -  ^  '"'here  the  are  known  functions  of  time,  and 

0-1 
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Historical  Note 


^^(t)  and  are  zero-aean  processes,  not  necessarily  stationary.  Some¬ 
times  the  best  linear  estimate  of  m(t)  -f  subject  to  the  restriction 

that  the  estimates  of  m(t)  be  absolutely  unbiased,  Is  desired;  we  call  this 
the  best  hybrid  linear  estimate.  Methods  employed  Include  extensions  of 
the  Wlener-Hopf  integral  equation,  expansion  of  random  functions  in  terms 
of  a  denumerable  infinity  of  orthogonal  random  variables  (usually  by  the 
methods  of  Karhunen,  1947  and  Lodve,  1946),  Green's  function,  and  repro¬ 
ducing  kernel  Hilbert  spaces.  Zadeh  and  Ragazzlnl,  1950,  considered  cases 
with  and  having  absolutely  continuous  spectral 

distribution  functions,  extended  the  Wiener-Hopf  Integral  equation,  and  ex¬ 
pressed  best  hybrid  linear  estimates  in  terms  of  solutions  of  Integral 
equations.  Grenander,  1950,  obtained  the  explicit  solution  for  X  in  the 
case  u(t)  •  X,  ^^(t)  -  0,  autoregressive  and  stationary,  using 

Karhunen-Loeve  expansions.  Davis,  1952,  characterized  the  solution  of  cases 
with  uCt)  polynomial  in  time  in  terms  of  characteristic  values  and  charac¬ 
teristic  functions  appearing  in  the  Karhunen-Loeve  expansions.  Somewhat 
similar  methods  were  applied  by  Kalllanpur,  1959,  in  cases  with  un¬ 

correlated  with  ^2^''^)'  I^tsschev,  1960,  also  uses  such  expansions,  which  he 
calls  canonic  representations,  to  characterize  estimation  problems.  Dolph 
and  Woodbury,  1952,  used  Green's  functions  In  cases  with  ^^(t)  and  ^2^'*^^ 
uncorrelated  and  autoregressive,  and  generalized  to  some  extent  the  results 
of  Grenander,  1950;  In  more  complicated  cases,  they  express  the  results  In 
tenas  of  solutions  of  integral  equations.  Bendat,  1955,  generalized  the 
Wlener-Hopf  Integral  equation  and  solved  estimation  problems  with  •  O, 

having  a  damped  exponential  cosine  autocorrelation  function,  and  m(t) 
a  finite  Fourier  series.  Shlnbrot’s  work  was  similar  to  that  of  Bendat;  he 
set  m(t)  -  O  and  solved  some  special  cases.  Parzen,  1960,  characterized  the 
problem  In  terms  of  reproducing  kernel  Hilbert  spaces.  The  explicit  solu¬ 
tions  of  estimation  problems  considered  in  this  paragraph  are  limited  to  a 
few  very  special  cases,  most  studies  leaving  difficult  problems  involving 
solution  of  Integral  equations,  iterative  evaluation  of  reproducing  kernel 
Inner  products,  or  determining  an  Infinite  sequence  of  characteristic  values 
and  characteristic  functions. 

Problems  of  best  linear  estimation  In  the  fictional  limiting  case  In 
which  observations  Include  white  noise  have  been  completely  solved,  even 
with  vector  observations,  although  frequently  In  a  non-rigorous  manner. 
Important  work  on  this  problem  was  done  by  Follln  (Carlton  and  Follin,  1955), 
based  on  sequential  calculation  of  best  absolutely  unbiased  linear  esti¬ 
mates.  General  results  for  the  white  noise  case  are  given  In  Kalman  and 
Bucy,  1961. 
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The  methods  of  the  present  paper  are  much  more  elementary  than  those 
used  in  the  cited  literature,  and  lead  to  explicit  solutions  in  very 
general  situations.  The  key  assumption  of  the  present  paper  is  that  the 
random  process  in  question  is  a  wide  sense  vector  Harkov  process.  The  ex¬ 
plicit  solutions  found  in  the  literature  assume  this  and  considerably  more; 
first,  scalar  observation  functions,  and  second,  particular  types  of  wide 
sense  vector  Markov  processes,  e.g.,  autoregressive  processes. 
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III.  FINITE  SET  OF  OBSERVATIONS 


In  this  section  we  consider  a  finite  number  of  observations,  say 
^2'  ''  ’  ^v’  functions  of  the  x  we  wish  to  estimate.  He  con¬ 

sider  estimates  x*  which  are  linear  combinations  of  the  observations,  and 
wish  to  choose  x  to  minimize  the  variance  of  estimate 

8*  A  B(x  -  X*)  (x^^*). 

An  estimate  x*  Is  said  to  be  unbiased  If  and  only  If  E(x  -  x*)  -  0.  It 

will  be  shown  that  the  x*  minimizing  S*,  denoted  by  it  with  corresponding 

d.  Is  unbiased.  An  estimate  x  is  said  to  be  absolutely  unbiased  If  and 

♦  ♦ 

only  If  E(x  |x)  =  x  for  arbitrary  x.  The  x  minimizing  S  among  absolutely 
unbiased  x*  Is  denoted  by  x,  with  corresponding  ?.  It  appears  reasonable 
to  consider  the  "best”  estimate  of  x,  and  x  the  best  absolutely  unbiased 
estimate  of  x,  and  to  restrict  the  use  of  x  as  an  estimate  to  those  cases 

In  which  It  Is  not  possible  to  calculate  k. 

We  shall  consider  first  x,  then  ft,  and  compare  these  estimates  In 
the  case  In  which  either  can  be  obtained.  Sequential  calculation  of  x  and 
§  will  then  be  developed.  We  shall  finally  consider  sequential  calculation 
of  X  by  use  of  these  results.  In  Appendix  A,  the  best  absolutely  unbiased 
estimate  of  xC^)  In  a  non-trivlal  random  process  Is  considered. 


III.l  The  Best  Absolutely  Unbiased  Estimate,  x 


The  theory  of  x  Is  classical  least 
Gauss,  Markov,  and  others.  We  summarize 
corollaries.  (In  Theorem  O,  y  and  v  are 
X,  g,  g  X  are  column  vectors  of  v 

are  u  x  p ;  S-  ,9  are  v  x  v;  F,  F  are  v 

•  I B 

The  hypotheses  Imply  that  u  ^  v . ) 


squares  theory,  developed  by 
the  results  In  Theorem  0  and  its 
column  vectors  of  p  components; 
components.  M  Is  u  x  v;  R,  H 
XM;Aiscxv;G,  B  are  o  x  m • 


Theorem  0.  Given  an  observation  vector 
(0.0.1)  y  -  Mx  +  v. 
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Finite  Set  of  Obnervntions 


(0.0.2) 

(0.0.3) 


(0.1.1) 

(0.1.2) 

(0.0.4) 

(0.0.5) 

(0.0.6) 

(0.0.7) 

(0.1.3) 


V  having  zero  mean  for  every  x  and  finite  variance 
R  A  Evv, 

with  R  and  i&l~^  M  nonsingular;  M,  F,  and  g  constants. 

A.  The  linear  estimate  of  x, 

£  fy  +  K 

is  absolutely  unbiased,  i.e.,  for  all  x 

E  x„  lx  -  X  , 

F,g' 

if  and  only  If 

g  ~  O,  and 
FM  -  I. 

B.  Of  the  estimates  x-  „  satisfying  FM  •  I,  the  estimate  which 
minimizes  the  variance  of  estimate 

®F,g  ^  ^^’'F.g  - 
is 

X  A  (Hr"^  M)~^  i6l"^y  , 

i.e., 

F  ^  (Hr"^  Mf^ilR"^  , 
g  A  O  . 

The  minimum  variance  of  estimate  is 

S  A  E(x  -  X)  (jf^x)  -  PR?  -  (MR~^  M)’^  . 
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Finite  Set  of  Obeervatione 


The  variance  of  estimate  associated  with  x_  _  Is 
(0.1.4)  8,  „  -  ff  +  (F  -  D  R(F^r)  +  gg. 

C.  X  -  X  Is  orthogonal  to  x  and  to  the  residuals  y  -  Mx,  which 
have  mean  zero  and  variance 

(0.1.5)  E(y  -  16c)  (y^^^)  -  R  -  IKBr"^  M)"^  fl  . 

0.  X  Is  the  value  of  x  which  minimizes 
(y'^^^)  R"^  (y  -  Mx)  . 

E.  X  is  Invariant  under  nonsingular  linear  transformation  of  y. 

F.  1110  minimum  variance  absolutely  unbiased  linear  estimate  of 
Ax  la  (Xx)  >  Ax,  for  A  any  constant. 


Proof  A. 


From  the  hypotheses  and  definitions, 


'•f..  -  ■  • 


j(Fy  +  g)  j:^  -  E  ^^(Mx  +  v)  +  ^l} 


which  Is  Identically  x  If  and  only  if  g  >  0,  FM 


-  FMx  +  g, 
-  I. 
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Finite  Set  of  Obeervatlone 


Proof  C 
(0.2.6) 

(0.2.7) 


Noting  from  (0.0.6)  and  FM  -  I  that  Ph?  -  (Hr"^  M)~^  Hr"^  •  rP 

-  (Br"^  M)"^, 

we  have  (F  -  P)  R  (F  -  F)  -  FRp  -  PR?  . 

This  relation,  with  (0.2.1)  and  (0.2.2)  yields 

-  S  -  FR?  +  gg  -  P)l?  -  (F  -  F)  R  (F  -  P)  +  gg  , 

F ,  g 

which  is  (0.1.4). 

F  is  minimum  over  S„  since 
r  >  g 

S„  _  -  ?  is  clearly  non-negative  definite. 

r  f  8 

By  (0.0.5),  (0.0.6),  and  (0.0.1), 

x-x-Py-x  -  P(llx  +  v)  -  X  -  Pv  , 
and  further  using  (0.1.2)  , 

y-IU-Hx-fv-HP  (Nx  +  v)  -  (I  -  HP)  v  . 

Since  v  has  mean  zero  for  all  x,  by  hypothesis,  (0.2.7)  shows 
that  y  -  Mic  has  mean  zero,  and  (0.2.6)  shows  that  x  -  x  is 
orthogonal  to  x. 

To  prove  orthogonality  of  x  -  x  and  y  -  Mx,  we  compute  from 
(0.2.6),  (0.2.7),  (0.0.2),  and  (0.0.6), 

E(y  -  Mx)  (j?^^)  =  (I  -  MF)  (Evv)P 

=  (I  -  MP)  RR"^  M(MR“^  M)"^ 

=  (M  -  M)  (Hr"^  M)"^  =  0  . 
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Proof  D 


(0.2.8) 


Proof  E 


Finite  Set  of  Observations 


From  (0.2.7),  (0.0.2),  and  (0.0.6), 

E(y  -  Iff)  (y^^x)  =  (I  -  MT)  (Evv)  (I^^D 
=  R  -  RfS  -  mPr  +  mPrPm 
=  R  -  M(MR"^  M)"^  ff  . 

From  (0.2.7)  and  (0.0.6), 

(y  -  Mx)  =  MR"^  (I  -  MDv 


MR 


[l  -  M(MR"^  M)"^  MR"^^ 


(MR“^  -  MR"^)  V  =  0  , 


from  which  it  follows  that 


(y  -  Mx)R"^  (y  -  Mx)  =  jy  -  Mx  +^(x  -  x^  R"^  -  Mx  +  M(x  -  x^ 

=  (y  -  Mx)  R"^  (y  -  Mx) 

+  R‘^  |m(x  -  x^  , 

proving  assertion  D,  since  the  scalar  term 
^(^ ”  clearly  non-negative  definite. 

We  assume,  without  loss  of  generality,  that  the  nonsingular 
transformation  of  y  is  homogeneous,  with  matrix  H.  Then  y,  M, 
and  V  are  transformed  Into  Hy,  HM,  Hv,  and  R  Into 
E  ^(Hv)  (Hv^  =  Thus  x  ^  (MR"^M)"^  MR"^  y  Is  transformed 

Into  pW)  (HRB)"^  (HM^  (Ml)  (HRH)"^  Hy,  which  Is  x  since 
(HRlf)"^  =  R"^  H"^. 
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Proof  F.  By  conclueion  A,  every  absolutely  unbiased  linear  estimate  of 
Ax  Is  of  the  form  Gy,  with  Gll  »  a.  Let  G  =  AF  -»■  B;  since 
Ai  »  I,  GM  =  A  Implies  BM  »  0.  The  variance  of  Gy  Is,  by 
(0.2.1), 

(0.2.9)  GR5  =  (aF  +  B)  R  =  aFr¥a  +  BRi,  since  from  (0.0.6) 

and  BM  =  O.one  has  FRI  =  (2r~^  M)~^  rS 

=  (Br"^  M)"^  iffi  =  0  . 

BR8  being  non-negative  definite,  the  variance  Is  minimized  by 
(Be)  =  AFy  =  Ax  . 

Corollary  ^  to  Theorem  0.  If  v  Is  transformed  into  Yv,  0  <Y*<  *,  F  Is 

A 

unchanged  and  F  Is  changed  by  the  factor  Y  . 

Proof  of  Corollary  1. 

This  Is  evident  from  Inspection  of  (0.0.2),  (0.0.6),  and  (0.1.3). 
Corollary  2  ^  Theorem  0.  If  v  Is  normally  distributed,  then  x  -  x  and 
y  -  Mx  are  normally  distributed  and  Independent; 

(y~^  Mx)  R~^  (y  -  Mx)  Is  distributed  as  with  degrees  of 
freedom  equal  to  dimensionality  of  y  minus  dimensionality  of  x; 
and  X  Is  the  maximum  likelihood  estimate  of  x.  If  R  does  not 
depend  on  x. 
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Proof  of  Corollary  2.  By  (0.2.6),  (0.2.7),  it  -  x  and  y  -  Ific  are  linear 

combinations  of  v,  hence  normal,  and  are  Independent  since 

orthogonal  by  conclusion  C. 

X  is  the  maximum  likelihood  estimate  by  conclusion  D, 

since  the  likelihood  of  the  observations  is  proportional  to 
^-(y"^r-fct)  R-1  (y  .  Ux)/2 


The  quadratic  form  of  (0.2.8), 


V  V  =  (y  -  Mx)  R"^  (y  -  Mx) 


+  (x^'^x)  (Hr"^  M)  (x  -  x). 


is  the  sum  of  squares  of  orthonormal ized  components  of  v. 


while  the  second  term  on  the  right  is  the  sum  of  squares  of 
orthonormal ized  components  of  x  -  x.  Thus  the  first  term  on 


the  right  is  the  sum  of  squares  of  orthonormal ized  variables. 


the  number  of  which  is  the  dimensionality  of  v  minus  the 


dimensionality  of  x.  These  variables  are  linear  combinations 


of  y  -  Mx,  hence  are  normal,  so  the  sum  is  distributed  as  x  • 


/. 


Corollary  3  ^  Theorem  0.  If  R 


^  R  ® 
"2 


\ 


V/ 


,  with  M 


lu  ' 

"l 

“2 

\{l 


and 
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yi\ 

^2  corresponding  partitions  of  M  and  y,  then  If 

\yj 

*a  “a  nonsingular,  a  =  1,2,..,^, 


-1  V 


(0.3.1)  X  = 


2  vM  S 


\a=l 


ff, 


-1  _ 

3  ’‘3 


3  =  1 


(0.3.2) 


-1 


\a=l 


where 


(0.3.3) 

(0.3.4) 

Proof. 


*0  k  **0^  ®0*^0  y0  > 


5-  A  (ff„R„"^  R„)"^ 

QL  «=  (Id  d 


Immediate,  by  substitution  Into  (0.0.5),  (0.1.3),  since 


MR"^  M  =  (5j^,ll2,..,M^) 


/r  \ 

'"l  o' 

R 
“2 


0 


\  »-*  / 


\  M  / 

'  V  ' 


2  “a  , 


a=l 


and  Sr~^  y  decomposes  similarly. 


The  hypothesis  that  the  vector  v  of  errors  of  observation  be  of  zero 
mean  for  every  x  Is  essential;  without  It,  the  estimate  Is  not  absolutely 
unbiased.  We  warn  the  reader  of  two  forms  of  estimate  often  confused  with 
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x;  (1)  the  estimate  minimizing  (y  -  Mx*)  (y~^ Tlx*),  the  sum  of  squares  of 
the  residuals;  (2)  the  maximum  likelihood  estimate  minimizing  the  weighted 
sum  of  squares  of  residuals  when  the  observations  are  not  linear  In  x. 

The  first  estimate  Is  absolutely  unbiased  but, In  general, has  greater 
variance  of  estimate  than  x;  the  second  Is  In  general  not  even  unbiased. 

Case  Ill.l.a.  A  frequent  special  case  Is  that  In  which  errors  of 

"""  '  "  2 

observation  are  uncorrelated^  with  common  variance  a  .  Then  R  Is  scalar, 

V  =  fif,  5  =  (101)"^.  If  the  errors  are  further  assumed  to  be 

normally  distributed,  (y  -  Mx)  (y  -"fix)  Is  a  sufficient  statistic  for  o^. 

Case  111, 1. b.  Another  frequent  special  case  Is  that  In  which  M  Is 
nonsingular,  so  that  x  and  y  have  equal  dimensions.  Then  x  Is  the  only 
absolutely  unbiased  estimate  (T.  W.  Anderson,  1962),  and  unless  R  Is  known, 
the  variance  of  estimate  cannot  be  calculated. 


Example  111. l.c.  Shlnbrot  (1956)  gives  the  following  problem,  which  we 
shall  consider  exhaustively:  Given  successive  observations  of  ^  +  e, 

S  constant  over  the  sequence  of  observations  and  uniformly  distributed 
between  and  ^2  ^  ^1'  ^  uniformly  distributed  between  -1  and  ■•■X.  >  0, 
successive  c  mutually  Independent  and  Independent  of 


Here,  we  calculate  T.  We  have  y  = 


+ 

ri\ 

• 

j 

*2 

w 

' 

vJ 

so  that  X  c  M  1^,  R 


/Ee  2 


0  •  „  2 


T  V 


I... 


Thus  X 


(Br"^  M)"^  Br"^  y  =  2  ’la  • 


a=l 


5  = 


(Br"^  m)"^ 


The  best  absolutely  unbiased  estimate  of  i  Is  nonlinear  In  this  case, 

being  half  the  sum  of  the  largest  and  the  smallest  observation.  The 

-2  -1 

variance  of  the  best  nonlinear  estimate  decreases  as  m  ,  rather  than  m  , 
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■o  that  the  linear  estimate  has  an  efficiency  tending  to  zero  as  u  Increases. 
(Carlton,  1946). 

III. 2  Best  Linear  Estimate,  ft 

Given  the  first  two  moments  of  x  and  y,  one  can  calculate  x  without 
requiring  linearity  of  y  In  x.  The  following  theorem  Is  stated  for  zero 
means  of  x  and  y,  but  Is  general  since  x  and  y  can  be  transformed  to  zero 
mean.  (In  Theorem  1,  y  Is  a  column  vector  of  m  components;  x,  x.  „  g, 

and  ft  are  column  vectors  of  v  components.  F  and  7  are  v  x  u;  S-  and  S 

are  vxv;Alsoxv;  (Byy)  and  H  are  u  x  u;  (Exy)  Is  v  x  u.) 

Theorem  Let  x  and  y  be  zero  mean  random  variables  with  finite  second 
moments , 

Eyy  nonsingular,  with  F  and  g  constant. 

A.  The  linear  estimate  of  x, 

(1.0.1)  Xp^^  4  Fy  +  g 

which  minimizes  the  variance  of  estimate 

(1.0.2)  A  E(x,  _  -  x)  x) 

F,g  «=  F,g  F,g 

Is 

(1.0.3)  ft  A  , 

with 

(1.0.4)  F  A  (Ex?)  (Eyy)"^  . 

ft  Is  an  unbiased  estimate  of  x,  l.e., 

(1.1.1)  Eft  Ex  =  0. 

The  Minimum  variance  is 

(1.1.2)  ftAS4A=  Exx  -  Exy  (Eyy)~^  Eyx, 

“  s 
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(1.1.3) 


Proof  A. 

(i.a.i) 

(1.2.2) 

(1.2.3) 


and  the  variance  of  any  linear  estimate  Is 

Sp  =  I  +  (F  -  fr)  Eyy  (F  -  &)  +  gg  . 

B.  X  -  ft  is  orthogonal  to  y  and  to  ft. 

C.  ft  is  Invariant  under  nonsingular  linear  transformation  of  y. 

D.  The  minimum  variance  linear  estimate  of  Ax  is  Aft,  for  A  any 
constant. 

From  (1.0.2)  and  (1.0.1),  and  hypothesized  zero  means  of  x  and  y, 
+  «  “  x)  (Fy~+^S'^x)  =  E(Fy  -  x)  (Fy^ x)  +  gg 
=  F(Eyy)  ?  -  F  (Eyx)  -  (Exy)  ?  +  Exx  +  gg. 

Substituting  (1.0.3)  and  (1.0.4),  i.e.,  ft  =  0,  ^  (Exy)  (Eyy)"^, 
S  =  Ex?  -  (Exy)  (Eyy)“^  (Eyx). 

Subtracting  (1.2.2)  from  (1.2.1),  again  using  (1.0.4), 

S_  -  I  =  gg  +  F(Eyy)  ?  -  F  (Eyy)  ft  -  ft(Eyy)  ? 

*  F  » 

+  ft(Eyy)  ft 

=  gg  +  (F  -  ft)  (Eyy)  (F  -  ft)  . 

Since  (1.2.3)  is  clearly  non-negative  definite,  ft  minimizes 
the  variance. 

Since  y  has  zero  mean,  (1.0.3)  shows  that  ft  has  zero 
mean,  i.e.,  is  unbiased. 
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Proof  B. 
(1.2.4) 

Proof  C. 


Proof  D. 

Case  III. 

(1.3.1) 

(1.3.2) 

(1.3.3) 

(1.3.4) 

(1.4.1) 


From  (1.0.3)  and  (1.0.4), 

E(x  -  ft)  y  =  Exy  -  EFyy 

=  Exy  -  (Exy)  (Eyy)"^  (Eyy)  =  0. 

Thus  X  -  X  is  orthogonal  to  y  and,  by  (1.0.3),  to  ft. 

Without  loss  of  generality,  let  the  transformed  y  be  Hy,  H 
nonsingular  by  hypothesis. 

Then,  from  (1.0.4),  ^  is  transformed  into  |ex  (^t^  jE(Hy)  (Ify^ 
=  (Exy)  HjH(Eyy)^  ^  ^  H"^,  and  from  (1.0.3),  ft  is 

transformed  into  (^~^)  (Hy)  =  ^y  =  ft. 

From  (1.0.3)  and  (1.0.4), 

OGt)  =  (EAxy)  (Ey7)"^  y  =  A(Exy)  (Byy)"^  y  =  Aft. 


-1 


2. a.  Let 


y  =  Mx  +  V, 


V  having  zero  mean  and  variance 
R  A  Evv, 

with  M  constant.  Define 
S  A  Exx 

T  A  Exv. 

Then 


^  =  (sSf  +  T)  (MsK  +  MT  +  W  +  R)"^  ; 
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(1.4.2)  S  =  S  -  (S8f  +  T)  (MSM  +  MT  +  TM  +  R)"^  (MS  +  T) . 

These  results  follow  from  evaluation  of  Exy  as  sif  +  T, 
Eyy  as  MsSf  +  MT  +  TO  +  R. 

III. 3  Comparison  of  x  and  x 

Suppose  we  have  observations  y  -  Mx  -f  v,  with  hypotheses  of  Theorem  0 
and  Theorem  1  simultaneously  satisfied,  with  S  nonsingular.  Thus  in  the 
special  case  III. 2. a.,  d  •  S  -  SM  (MSK  'f  R)"^  MS,  since  T  >0,  while  from 
Theorem  0,  9  **  (iIr"^  M)~^.  To  compare  these  variances  of  estimate,  we  use 
the  following  lemma. 

Lemma  A.  For  any  real  matrix  K, 

I  -  K(S:  K  +  I)"^  K  s  (K  K  +  I)"^. 

Proof.  K  being  a  real  matrix,  the  indicated  inverses  surely  exist.  The 
identity  can  be  verified  by  power  series  expansion,  or  developed  from  the 
evident  identity 


K  K  K  >  K°K  K  4^  K. 
Factoring  each  side,  we  obtain 


K(X  X  4  1}  ^  (K  K  4  I)K. 


From  this  identity,  the  desired  identity  is  obtained  by  post-multiplying 
each  side  by  (X  K  4  I)~^  1{(K  K  4  I),  adding  K  K  4  I  to  each  side,  and  pre- 
and-post-multiplying  by  (K  K  4  1)~^.  (The  identity  is  valid  for  any  K 
such  that  the  indicated  Inverses  exist.) 

Using  Lemma  A  with  S^^^  playing  the  role  of  K,  we  have 


-1  1-1-1  1-1 

(5  4  S"l)  =  (Br"!  M  4  s"i) 


=  S 


1/2 


t- 


=  S^'^^  (S^^*  Br"^  MS^'^^  4  I)"^  s^^^ 

s^^*  Br"^^^(r“^''*  msB  R"^^^  4  I)"^  R"^/^  MS^'^^^  S^'^* 


=  S  -  sB  (MsB  4  R)"^  MS  =  s. 
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In  short,  the  Inverse  of  d  Is  the  sum  of  the  Inverses  of  S  and  of  S’, 

Consider  a  sequence  of  values  of  S~^  tending  to  the  limit  zero. 

The  corresponding  sequence  of  values  of  S  must  tend  to  S  (in  fact,  it  can 
d  -1 

be  shown  that  S-S<«I,  IfS”  £eS  ).  From  Theorem  1,  Conclusion  A, 
(1.1.3),  since  Sy  =  x  and  =  k,  we  have  that 

S  =  S  +  E(x  -  ft)  (jT^Ic), 

so  that  as  d  tends  to  S,  ft  must  tend  to  x  in  quadratic  mean.  Thus  x  can  be 
considered  a  limiting  value  of  ft. 

To  show  more  directly  the  approach  of  ft  to  x  as  S~^  tends  to  the  limit 
zero,  we  can  modify  the  Identity  of  Lemma  A  as  follows; 

(K  X  +  I)"^  K  =  K  -  K(X  K  I)'^  X  K 

=  K  {I  -  (X  K  +  I)"^  X  K}  =  K(X  K  +  I)'^  ((X  K  I)  -  X  K)} 

=  K(X  K  +  I)"^. 

Using  this  identity,  we  have 


ft  =  ^(MSX  +  R)~^ 


=  irR-1/2) 

=  <8^/^  Xr"^  ms 


MS 


-1 


1/2, 


.1/2 


-1 


r-1/2 


=  (Xr"^  m  +  s"^)  '  Xr"^  , 

which  clearly  tends  to  V  =  (Xr~^  M)  Xr"^  as  S~^  tends  to  zero,  so  that 
ft  =  fty  tends  to  x  =  Fy. 


III. 4  Sequential  Calculation  of  (ft.  ft) 

With  observations  linear  in  x  and  the  structure  of  x  suitably  linear, 
it  is  possible  to  exploit  the  orthogonality  of  x  -  ft  to  ft  and  calculate  ft 
and  ft  sequentially.  We  denote  by  x^^  • .  ,  x^  the  values  of  the  process 

x(t)  at  the  epochs  of  the  successive  observations  y,  ,  y  .  The 

X  y  A  y  V 

successive  values  of  x  are  assumed  to  satisfy  the  relation 
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*a  ~  ^a,a-l*a-i  ■*■  ’'a* 


with  the  specified  matrices,  random  disturbances.  This  relation 

implies  that  is  a  complete  state  vector,  that  the  x-process  is  a  vector 
Markov  process  in  the  wide  sense.  The  following  theorem  shows  how  one  can 
predict  ft  and  d,  how  transforming  the  observations  by  subtraction  of  pre¬ 
dicted  value  of  observations  orthogonallzes  them,  and  how  to  estimate 
ft(T)  and  its  variance  sequentially.  It  is  assumed  that  one  has  a  best 
estimate  of  x^,  and  known  variance  of  estimate,  at  some  epoch  prior  to  the 
first  observation  yj^.  (In  Theorem  2,  y^  and  are  column  vectors  with 
components;  x^,  v^,  ft^  and  are  column  vectors  of  components. 
(Typically,  but  not  necessarily,  =  °2  “  Letting  m  represent 

V  V 

^  pi^,  and  0  represent  ^  o^,  one  has  that  x  and  v  are  oxl;yisuxl; 


a=l 


a=l 


^,a  *  ‘^a’  i«  U  X  u;  is  x  o^;  ^  is  x  o^;  and 

^^0,8  °a  * 

Theorem  2.  Let  the  random  variables 


*  ft  *(v)  ft  <*1'*2' 


y  ft  y(v)  ft  <yi'y2'  •••  *  yv>  » 


ft  ^V)  ft  <^1'V  •  ' 


(2.0.1) 

(2.0.2) 


satisfy  the  relations,  for  a  >  1,  2,  ••  ,  v, 

a  a, a-i  a-i  a  ' 

y_  ■  A_«_  , 

'o  a  a 

with  Lq  and  constants;  mutually  orthogonal  zero  mean 
random  variables;  second  moments  of  v  and  x  finite;  lyy 
nonsingular.  Denote  by  ft„  .  the  minimum  variance  linear  estimate 

n 
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of  based  on  y(0)^  '*  •  y^^t  assume  v  orthogonal 


to  ftg  Q  and  to  x^. 


(2.0.3) 

4  S,e-1  *'B-l,B-2  ' 

(2.0.4) 

4  ®(*a  - 

(2.0.5) 

fl 

a,  p 

4  *(*a  -  ^a,B> 

(2.0.6) 

“a  4 

*a  “  ^a,a-l 

(2.0.7) 

K  ^ 

Cl  e 

A 

“a, a 

(2.0.8) 

4 

Va  • 

Conclusion  A. 

For  a  <  6, 

(2.1.1) 

5 

*B,a 

S,a  ^a,a 

(2.1.2) 

Ss  n 
0,a 

Sjtt  ®a,a  ^6, a  * 

Z  ^s,y  • 


6 

Z 

Y=a+l 

Conclusion  B.  The  linear  transformation  of  y  Into  z,  defined  by  (2.0.6) 

and  (2.0.8)  Is  nonsingular.  The  vectors  ^^>^2  '  *  ' 

mutually  orthogonal,  with  mean  zero  and  variance 

(2.1.3)  =  A„S„  „  ,X„  . 

(X  a  a  a,(x-l  a 


For  a  <  6,  z^  is  orthogonal  to  Ug, 
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Conclusion  C.  Estimates  ft,  , .  ••  .ft  and  their  variances  can  be 

obtained  sequentially  by  use  of  (2.1.1)  and  (2.1.2)  with 
0  =  a  +  1,  and 


(2.1.4) 

(2.1.5) 

(2.1.6) 


^a,a  ^a,a-l  ’ 


u  =  (Eu  z  )  (Ez  z  )”^  z 
“a  '  “a  a  'a  a  a  ' 


a  ”  a  1  *  ^a  a  l  ^a  ^^a^a  a  l^a^  ^a^a  a  l 


Conclusion  D.  Estimates  ft.  _  ,.ft.  .  ••  .  x.  and  associated  variances 

-  —  a,a-«-l’  a,a+2  ’  a,v 

can  be  calculated  sequentially  using  the  additional  relations, 

for  a  <  0, 


(2.1.7) 

*a,0 

“  *a  0-1  0-1  ^0  0-1^0  *^0  ’ 

(2.1.8) 

=  Eg  jl  -  Ag  (EZgZg)”^  A,Sg  g_^ 

(2.1.9) 

'‘a,0 

"  ®a,@-l 

-  V. 


iAa  (Ez^z,)  AqLq  q 


'a,0-l  “0,0-l‘’8  "0“0,B-l’a,@-l  • 

ftroof  A.  The  hypotheses  of  Theorem  1  are  satisfied,  assuming,  without 
loss  of  generality,  that  q  =  EXq  =  0. 

Repeatedly  using  (2.0.1)  and  (2.0.3), 


(2.2.1) 


*0  ■  ''s  *  S,0-1  *0-1  "  ''0  ^  *'0,0-l''0-l  *  S,P-2*0-2 


*  ‘'0,0-l''0-l  ^  S,0-2''0-2 


+  S,a+l''afl  ^,a*a  • 
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Proof  B. 
(2.2.2) 


By  Conclusion  D  of  Theorem  1,  =  Og  +  •  •  •  +  Lg  By 

hypothesis,  Vg,  ••  ,  have  mean  zero  and  are  orthogonal  to 

Xq  and  Vj^,  Vg,  •  •  >  hence  from  (2.0.1)  and  (2.0.2)  to  y^^^, 
so  that  ftg  = 

From  (2.0.4),  (2.1.1),  and  (2.2.1),  using  orthogonality  of 
the  components  of  v. 


®B,a  k  -  ^B,a>  <*S  '  ^8,a> 


-  S,B-i®''e-i''e-iS,s-i 


*  S,a+l®’'a+l''a+l^B,a+l  *  ^e,a®a,a^B,a, 


which  Is  (2.1.2). 


From  (2.0.8),  (2.0.6)  and  (2.0.2). 


'a  a  a,a-l. 


with  by  definition  a  linear  combination  of  y^,  yg,  y^.i* 

Thus  the  transformation  matrix  Is  triangular  with  unity  diagonal 
elements,  hence  nonsingular.  Ez^  =  0  since  by  Theorem  1, 


Is  an  unbiased  estimate  of  x^. 


From  (2.2.2)  and  (2.0.4) 

Ez_z  =  A  ®(x  -ft  ,)  ,)  „  ,X„, 

01  a  a  a  a,a-i  a  a,a-l  a  a  a,a-l  a' 


which  Is  (2.1.3), 
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From  (2.0.6),  (2.0.1),  and  (2.1.1), 

“e  «  “  ^e,3-i  “  S,e-i*6-i  ''e  "  S,B-i^e-i,p-i 


(x  *  ^  ) 

i,B-l  ^*B-1  ’‘0-1,6-!^ 


+  Vg  . 


By  Theorem  1,  Xp_j  -  g_i  is  orthogonal  to  y(p_i)» 

and  Vg  was  shown  above  to  be  orthogonal  to  y(g_x)‘  Ug 

is  orthogonal  to  hence  to  z^g_^^  which  is  a  linear 

combination  of  y(g_]^)»  Thus  Ug  is  orthogonal  to  a  <  B, 
and  since  Zg  A  AgUg,  the  components  of  z  are  mutually  orthogonal. 

(2.1.4)  follows  from  Theorem  1,  since 


*a  *  a-1  "  ^o,a-l^  k  a-1  “a  * 


(2.1.S)  is  obtained  from  Theorem  1  and  Conclusion  B: 


k  \,a  -  ^®"a*(a)^  ^®®(o)®(a)^  ^  ®(a) 


~Lr 

=  Eu„z„  (Bz„z_)  z„ 

(X  Cl  a  (X  Cl 


(2.1.6)  is  obtained  by  using  (1.1.2)  with  x  A  x^  - 
y  A  Zg,,  i.e., 

Qi^a  Qi,  ci*l  u  (X  a  a  oi  ci  ' 

with  Ez„z„  =  An®n  n  (2.1.3),  and 

a  a  A,  d-l  & 


Eu^Z-  =  Eu.u.A..  =  S..  .X.. 

a  a  a  a  a  a, a-1  a 


by  (2.0.8),  (2.0.6),  and  (2.0.4). 
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Proof  D. 


(2.2.5) 


(2.2.6) 


(2.2.7) 


From  (2.0.1),  (2.0.2),  and  hypotheses,  Is  orthogonal  to 
*a  *a  e-1'  orthogonality,  (2.0.8),  (2.0.6),  and  (2.0.5), 


=  -  ^a,0-l> 

Ii'B,@-i^* 

=  E(Xjj  -  ^a,B-l^ 

^*0-1  “ 

"  '^a,B-l  ^0,0-1  * 

)  L, 


Writing  +  (x^^^  -  b-1^’  (2.2,5)  and 

orthogonality  of  components  of  Theorem  1  gives 

5  -  t 

*a,0  *o,e-i 


^  -  VB-l^*(e3  f^0)*(^  *(0) 

"  ^a,8-l  *  '^a,B.l  ^e,B-l*P  ^  *B  ’ 


which  Is  (2.1.7). 

To  verify  (2.1.8),  write  (2.1.7)  as 
a  a, r  a  a,B-i 


“  '^a,B-l  *'S,S-1*B^®®8‘®B^  *B  • 

From  (2.0.6),  (2.0.7),  (2.1.5), 

*0  -  ^0,0  =  “0  -  ^0  =  [*  -  <®“0*0>  ^®*0*B^“^  a]  Up  . 

Taking  the  expectation  of  (2.2.6)  times  the  transpose  of 
(2.2,7),  and  using  (2.2.5)  and  (2.0.5), 
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(2.2.8) 


1,0  ■  '^a,0-lS,0-l  "  '^a,  0-1^0,  B-i^ 

.^-Xg  (EZg?g)-l(Ezg^g^  . 


0  <®^0“0)>- 


From  the  proof  of  C,  EZgUg  =  AgSg  g_j^,  while  from  (2.1.3), 
EzgZg  =  AgSg  0_]^Ag.  Substituting  these  results  In  (2.2.8) 
yields  (2.1.8). 


Taking  expectation  of  (2.2.6)  times  Its  tranpose,  using 

(2.0.4),  (2.2.S),  (2.0.8),  one  obtains  (2.1.9). 

The  theorem  has  been  stated,  preparing  for  later  application.  In  terms 
of  a  decomposition  of  the  column  vector  y  Into  column  vector  components. 

In  application  to  a  finite  set  of  observations,  however,  one  could  de¬ 
compose  y  Into  scalar  components,  simplifying  the  computation  by  reducing 
to  a  scalar  the  matrix  to  be  Inverted. 


Example  111.4. a.  (Prediction)  An  example  from  Wiener  (1949)  supposes  two 
periodically  observed  signals  with  spectral  densities  f..(uj)  =  f«„(ii>)  =  1, 

111)  « ^ (ii  2  *  *  •  • 

$2i(«j)  =  «  e  ,  =  e  e  ,  with  c  <1.  We  wish  to  predict  each 

signal  at  future  times  of  observation.  Let  the  first  signal  be 

rij\  .  The  given  spectral 

^2/ 

densities  imply  x^  = 


the  second  signal  ^2  =  ^^2' 


=  X  =  y  = 


with  L 


'a,a-l 


0  e\  ,  Bv„v„  =  l-e*  0\  . 
'  a  01 

0  0  0  1 


For  0  >  a,  n  =  *'0  n  « 
'  0,a  0,a  a, a 


0  c 

,0  0 


0-a 


’0,a  “  *'6,a 


®a, a  ^0 , a 


•  I 

Y=a+1 


'0,Y 


(Ev^v^) 


'8,Y 
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Exaeple  III.4.b.  Here  we  consider  example  III.l.c.,  and  calculate  ?  sequentially. 
Since  ^  Is  constant,  we  denote  by  ^q,  the  best  estimate  of  C 

based  on  no  observation,  one  observation,  •••,  m  observations,  and  since 
successive  are  independent,  we  need  not  compute  nor  Its  variance. 

Assuming  only  the  information  presented,  we  have 

?0  =  =  (^2  *  Yj)/2. 

E(?  -  E?)^  =  (Yj  “  Yj)^/12  A  Op,  the  variance  of  estimate  of 
E  €*  =  X*/3  A  6. 

?i  »  ?o  ^  Op  (Op  +  e)-i  (nj  -  ?p). 
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Letting  represent  the  variance  of  estimate  of 

°1  =  °0  -  °0  =  °0  ^  ^ 


"a+l  =  "a  -  °a 


=  o^e  (0^  +  =  Oj,e  |(a+l)0rt  +  0 


[(a+DOp  +  ^ 


the  final  expression  the  result  of  a  trivial  Induction.  We  recall  from 

2 

example  111.  l.c.  that  the  variance  of  T  Is  X  /3m  =  0/m,  whereas  the  variance 
of  ft  la  hero  shown  to  be  =  Ojj0  j^M 

Interesting  to  notice  that  the  best  linear  estimate  with  no  observation, 

2  2 

la  superior  to  Z  based  on  m  observations,  when  m  <®/°o  ~  ^^^2  ~  ^1^  ’ 

It  Is  also  interesting  to  note  that  the  best  nonlinear  estimate  of 
which  Is  far  superior  to  ^  when  m  Is  large,  can  be  obtained  sequentially. 

We  state  without  proof  that  after  the  a—  observation,  ?  is  uniformly 


distributed  over  an 

Interval  | 

4,a'  ^2,^  ' 

_^l,0'  ^2,^ 

«s 

El-  -'1 

]  ■ 

^l,a+l  = 

nax  ^'l,a» 

^a+l  - 

^2'a+l  “ 

min  (Xg 

nq+l  +  X). 

The  best  nonlinear  estimate  of  ;  after  the  observation  is  a  ^2 
with  variance  of  estimate  (X„  _  -  X,  „)*/12.  ’  * 

m.  9k  Sequential  Calculation  of  x.  Sequential  calculation  of  x  will  be 
desirable  In  case  one  wishes  to  obtain  x  first  on  the  basis  of  one  set  of 
observations,  then  on  an  enlarged  set  of  observations,  etc.  It  Is  tempting 
to  suppose  that  x  can  be  calculated  sequentially  by  using  a  preliminary 
calculation  of  x  for  Initial  estimates  In  Theorem  2,  then  calculating  the 
resulting  ft.  This  supposition  Is  not  correct  In  general,  but  In  view  of 
the  fact  that  x  can  be  considered  a  limiting  value  of  ft  (ef.  Section  1D.A, 
the  technique  can  be  used  In  either  of  two  cases:  a)  the  errors  of  observa¬ 
tion  of  the  various  sets  of  observations  are  mutually  orthogonal;  or,  b) 
the  Initial  calculation  of  x  Is  based  on  a  set  of  observations  with  non- 
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singular  M.  While  these  assertions  are  intuitively  not  difficult  to  under¬ 
stand,  the  proofs  are  somewhat  cumbersome. 

In  case  (a),  consider  two  sets  of  observations: 


Vl  =  MjX  +  Vj^ 

=  MgX  +  Vg, 

with  R  =  E  IvA  (Vj,  =  /Rj  °  \  * 

raj  \  0  Rgl 

=  +  BgRgiig)"^  , 

and  the  variance  of  x^,  based  on  y^^  alone,  is  7^  =  Calculating 

k  based  on  y^,  with  Initial  estimates  and  variances  x^  and  we  have  from 
example  111.2, a, 


Then  ? 


(ifj,  Hg)  R 


-1 


S  =  ^**2^1®2  *  *^2'  **2^1 


(R. 


-1/2 


“2^1 


®2**2 


-1/2 


by  evidently  Justified  manipulations,  9,  and  Ro  being  positive  definite. 

1/2  _.%•  ' 

Applying  Lemma  A,  with  K  =  9^^  ^2*^2"  '  •  above  relation  gives 


-1 

=  (»2**2'^  “2  *  ^1  >  = 


Since  S  =  9,  ft  must  equal  x,  the  minimum  variance  being  attained  uniquely 
by  ft,  from  Theorem  1. 
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In  the  second  case  we  consider  =  M^x  ■¥ 

yg  =  MgX  +  Vg, 


with  ^  =  f  \  '  oonsingular.  Since  -  0,  and  is 

'  *12  **22  ' 

nonsingular,  we  have  x  -  x ,  =  -  Vj^,  z  =  yg  -  MgXj  =  Vg  -  MgHj^"^  v^, 


and  =  11^“^  “ll**!"^- 


We  compute 


E(x  -  Xj)  z  =  ^  *ll“l  ^  “2  "  *12 


Bzz  =  Rgg  -  RjgMj-l  Bg  -  MgMj-l  R^g  +  llgM^’l  R^^  11^-1  Bg 


,  1/2  ^  1/2  1/2  ^  -1/2 
(1I2V  Rji  -  R,o  R,,"^^*)  (R,,  »o  -  R,,  R,o) 


*12  "11 


’ll  "1  "2  "11  "12' 


*  *22  "  *12  *11  ^  *12  • 


Thus  the  S  calculated  on  the  basis  of  yg  and  initial  value  of  x^,  is 


6  -  -  ^(x  -  {^*1  ^Bz(x-Xj^^ 


1  ^  *ll“l  *■**!*  ^“ll*!  ^  *2  "  *12^ 
Some  evident  factoring,  and  use  of 

^22  -  *22  “  *12*11  ^  *12‘ 


*^1  „  -1 


r~i 

<Bzz 


<•*2*1  ^  *11  "  *12^  *1 
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yields 


d  u  -1  «  1/2 

®  =  “l  "ll 


-1/2 

*2  "  **11  **12^  **22 


{1/2 
I  -  <«i, 

•  Ip  ^11  II  ”*■  H  S  R  “^/2\  (a  11  “1  JH  _  R  “1/2  B  \p  1/2  ^ 

[^22  '"2"l  **11  **12**11  '  ***11  "l  "2  **11  **12**^22  *  J 

.  P  1/2  111  II  -1  R  1/2  jr  -  -1/2^  p  1/2  ^1  , 

**22  ***2*1  **11  **12  **11  *>  **11  "l 


Applying  Lema  A  to  this  expression,  with 
K  =  (R,,1/2  M^l  - 


R  ”1/2  n  \  p  1/2 
“ll  12'  *^22  ' 


we  obtain 


S  =  II  -1  R  1/2  Jin  1/2  J^l  a  R  -1/2  R  IP  mu  -1r  1/2  _  r^  r  "1/2)  +  tI"^ 

®  "l  **11  y**ll  “l  "2  ■’  **11  **12'*^22***2"i  **11  **12**11  '  *  *[ 

Rjj1/2  1Ij“1  ,  giving 

S  *  •  (®2  -  ^  *12^  ^22  *"2  ”  ®12  **11  ^  “1^  *  *1**11  ^  “l* 


-1 


It  is  easily  seen  that  this  is  7  =  ifR"!  M,  since 


“■*  ■  I  »ll'‘  *  «U'‘ 


**22®12®11 


_  n  ”1  n  p  ^ 

“11  “12*^22 


■^22  / 


Thus  the  equality  of  3  and  S’  is  established,  so  that  «  x  as  in  the  previous 
case. 
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Even  though  observations  are  made  only  at  discrete  time  points,  it  is 
desirable  to  consider  a  continuous  process  x(t)  correctly,  rather  than  as  a 
discontinuous  process.  This  is  especially  true  in  case  it  is  desired  to 
predict  future  values  of  the  x  process,  not  necessarily  only  at  times  of 
anticipated  observation.  In  this  section  we  consider  continuous  processes 
corresponding  to  the  discrete  processes  specified  in  Theorem  2,  and  develop 
continuous  extrapolation  of  best  estimates  and  their  variances  of  estimate. 
In  IV. 2  cases  of  simple  prediction  are  considered;  these  are  cases  where  the 
best  predictions  depend  only  upon  the  present  values  of  scalar  observations 
and  their  derivatives.  In  IV. 3  the  limiting  case  of  observations  with 
superposed  random  Impulse  functions  is  developed. 

IV. 1  Continuous  Linear  Structure 

In  the  preceding  section  we  considered  processes  with  a  linear 
structure : 


a  01,  a*-!  a-i  (t 

We  wish  now  to  subsume  these  processes  by  continuous  processes  with  a 
similar  linear  structure.  This  is  frequently  done  by  expressing  the  struc* 
ture  as^*^ 

(IV.l)  x'(t)  -  B(t)  x(t)  +  V 

where  B  is  a  matrix  function  of  time  and  v  is  a  random  impulse  function 
(fictitious  derivative  of  a  process  of  orthogonal  increments).  Although 
this  is  a  reasonable  characterisation,  it  is  possible  and  for  some  purposes 
preferable  to  express  the  structure  of  the  process  without  introducing  such 
random  Impulse  functions,  and  we  shall  do  so  in  the  following  theorem.  In 
practice,  one  may  frequently  view  the  structure  as  specified  either  way. 


We  recall  that  we  denote  dr(T)/dT  and  d7(T,o)/dT  by  F'(t)  and  F'(t,o) 
respectively. 
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with  the  symmetric  non-negative  definite  matrix  Q(t)  the  rate  of  growth  of 
variance  of  the  process  of  orthogonal  Increments,  of  which  v  Is  the  ficti¬ 
tious  derivative. 

If  (IV. 1)  Is  to  represent  the  structure  of  x(t)  it  Is  clear  that  the 
vector  x(t)  must  be  a  complete  state  vector.  Including  as  components  all 
existing  non-zero  derivatives  of  components,  and  can  not  In  general  be 
restricted  to  a  particular  set  of  linear  combination  of  components  (some¬ 
times  called  the  signal)  which  may  be  of  special  Interest.  If  the 
diagonal  element  of  Q(t)  is  non-zero,  no  derivative  of  ?^(t),  the  com¬ 
ponent  of  x(t),  can  exist.  The  following  theorem  exhibits  a  general  process 
structure  replacing  (IV. 1)  and  Identifiable  with  the  discrete  process  struc¬ 
ture  of  Theorem  2,  with  non-singular,  and  shows  the  best  linear  pre¬ 

dictor  of  x(t),  and  Its  variance  of  estimate.  (In  Theorem  3,  x(t),  &(t,t^), 
and  w(t2,t^)  are  column  vectors  of  p  components;  K(t2,Tj^),  B(o),  Q(o),  and 
S(T,Tg^)  are  p  x  p.) 

Theorem  3.  Over  a  finite  Interval  let  B(o)  and  Q((J)  be  bounded  piecewise 

continuous  square  matrices,  with  Q(o)  symmetric  and  non-negative 
definite.  For  any  s  s  Tg  s  let  K(Tj,Tj)  -  1  and 

(3.0.1)  K'(t2,Tj)  -  B(t2)  Ktg.Tj)  ; 

let  x(t)  be  a  random  process  with  x(t^)  having  finite  variance 
and 

(3.0.2)  *<^2^  “  x('ri)  +  w(t2,Tj), 

where  the  random  variables  w(t2,t^)  have  mean  zero,  variance 
(3.0.3)  Ew(t2,Tj)  w(t2,Tj)  "  J*  ^(^^,0)  Q(o)  Ktj.o)  do, 

are  mutually  orthogonal  for  non -over lapping  Intervals  [l.e.,  for 
T,  *  Tj  *  Tg  s  Tg  a  s  ■''(T^.Tg)  w(Tg,Tj)  -  O] ,  and  are 

orthogonal  to  x(‘r)  and  ft(T,T)  for  any  t  s  t^. 
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Conclusion 

Conclusion 

(3.1.1) 

(3.1.2) 

Proof  A. 


A.  For  any  Tq.Tj,  ..  i  t^,  with  a  Tq  <  Tj  <  . .  < 

the  hypotheses  of  Theorem  2  regarding  x„ ,  ^  ,  and  v  are 

a  a,a-i  a 

satisfied  by  x(t^),  K(Tj^,t^_j),  and  w('r2.Tj^_j) ,  respectively. 

B.  For  T  s  T^,  let  ^(t.t^)  and  S(t,t^)  represent  the  minimum 

variance  linear  estimate  of  x(t)  based  on  xnd  the  corre¬ 

sponding  variance  of  estimate. 

8  ft(T,T  ) 

^^'(t.t^)  4  - -  b(t)  ft(T,T^)  . 

S  A  - 5-f  ■■  "  ®<t)  8(t,  t^)  +  S(T,  T^)  B(t)  +  Q(t). 

The  Identification  of  x(t  )  with  x„ ,  K(t^,t^  ,)  with  ^  , 

Cl  a  a  a»i 

and  with  »(‘''a'‘'^a-l^  evident  on  comparing  (3.0.2)  with 
(2.0.1).  K(t2>t^)  is  A  bounded  constant  by  (3.0.1),  B(o)  being 

Integrable  by  hypothesis.  The  a  -  1,  2,  ..  , 

V  are  mutually  orthogonal  by  hypothesis,  of  finite  variance  by 
(3.0.3)  and  hypotheses  on  B(o)  and  Q(o).  The  vmriables  are 
orthogonal  to  ^  and  x^  by  the  final  hypothesis.  Finite  sec¬ 
ond  moments  of  x  follow  from  (3.0.2)  and  finite  variance  of 
x(t,)  . 
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Proof  B.  From  Theorem  2,  Conclusion  A,  and  (3.0.3) 

(3.2.1)  ft(T,Tj)  -  IC(t,Tj)  ^i(Tj,Tj), 

(3.2.2)  S(t,Tj)  -  K(t,Tj)  S(Tj,Tj)  K(t,Tj) 

T 

+  j  I(t,o)  Q(o)  K(t,o)  do. 

"l 

Differentiating  (3.2.1)  and  (3.2.2)  with  respect  to  t,  and  using  (3.0.1), 
one  obta ins  (3.1.1)  and  (3.1.2). 

(Tj-Ti)  B 

If  B(t)  is  constant  over  then  Ktj.Tj)  -  e  . 

The  processes  x(t)  postulated  in  the  theorem  are  continuous 
vector  Harkov  processes  in  the  wide  sense,  without  discontinuities  in  first 
or  second  moments.  Such  processes  are  typical  in  practical  an>lications. 
Problems  in  which  x(t)  does  not  satisfy  the  requirements  of  the  theorem  may 
require  the  use  of  Theorem  2  as  an  alternative  or  supplement  to  Theorem  3. 

Example  IV. 1. a.  An  example  from  Wiener  [1949]  assumes  observation  on  an 
ergodic  process  with  spectral  density  1/(1  •••  uu  ).  Here  we  have 

r|  -  y  ■  X  ■  ? . 

To  specify  problems  of  this  nature,  one  can  factor  the  spectral 
density  into  conjugate  factors  in  p  -  uu,  one  of  which  has  neither  poles 
nor  zeroes  in  the  lower  half  plane,  and  regard  ti(t)  as  the  result  of  this 
factor  operating  on  a  random  impulse  function.  Thus  in  this  example, 

^1(1')  -  V,  with  V  a  (fictitious)  random  impulse  function.  Operational 
calculus  gives  ti(t)  +  r\'{r)  -  v,  or  5(t)  -  -1  •  5'(t)  +  1  •  v,  bo  that  B  - 
-1,  Q-  1. 

Clearly  t(T,T)  -  ti(T),  the  observation,  and  S(t,t)  -  0.  Assum¬ 
ing  observations  extending  to  time  t,  we  can  predict  ; (t  4a)  and  its  vari¬ 
ance  by  (3.1.1),  (3.1.2); 

^'(t  +  a,T)  ■  b(t  +  a)  t(T  +  a,T)  -  -  ^(t  +  a.t) 

S'(t  ♦  o,T)  -  8(t  +  a,T)  S(t  +  a)  +  B(t  +  a)  S(t  +  a,T) 

+  Q(t  +’a)  -  -  2  S(t  +  a,T)  +  1, 
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from  which  t(T  +  a,T)  -  tCT.T)  e~'^  -  r|(T)  e”*^  , 


a(T  +  a.T)  -  (1  -  e"'‘“)/2 


IV. 2 .  Simple  Prediction. 

It  is  well  known  (e.g.,  Doob  [19531),  that  prediction  of  an  ob¬ 
served  stable  scalar  process  with  spectral  density  of  the  form 


1/  IE  ,  with  J  -  /  -  1 ,  depends  only  upon  the  values  at  the  time 


of  final  observation,  of  the  variable  and  its  derivatives  up  to  the  (5  -  1)*  , 
which  exist.  Let  the  observation  be  r|^(T)  -  then  the  successive 

derivatives  give  4  ti2(t)  -  fj^'(T)  -  52 ('’')•  etc.  The  complete  state 

vector  is  x(t)  equating  ti(t)  and  its  derivatives  to 

U,(r)] 


\5b<^)/ 

. 5g(T),  we  have  -  x(t),  S  -  0. 

This  result  can  be  generalized  to  nonstationary  cases.  Suppose 
that  the  complete  state  Vector  is  x(t)  \  >  end  that  over  an 

[  5i'’> 


arbitrarily  short  Interval  ending  at  time  t,  the  observation  nCo)  -  5j^(o). 
The  derivatives  of  t^Ct)  are  the  derivatives  of  so  that  x(t)  can  be 
determined  with  zero  error  as/  11  (t)  \  >  ^(t.t). 

/  n'(r)  \ 


The  best  prediction  is  obtained  by  applying 

(3.1.1)  with  initial  values  x(t,t)  - 


’lj(T) 
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The  Initial  variance  S(t,t)  Is  zero  In  relation  (3.1.2): 

8'<T  +  a,T)  -  B(t  +  a)  S(t  +  a,T)  +  S(t  +  a,T)  if(T  +  a) 


+  Q(t  +  a)  . 

We  note  that  the  essential  requirements  for  simple  prediction, 
l.e.,  estimation  with  zero  error  based  on  observations  over  an  arbitrarily 
short  Interval,  are  twofold:  first,  the  vector  x  must  consist  of  a  com¬ 
ponent  and  possibly  some  of  Its  derivatives:  second,  the  observation 

r)(T)  over  the  Interval  must  be  Absence  of  "error  of  observation" 

Is  not  sufficient  for  simple  prediction. 


Example  IV. 2. b.  An  example  from  Wiener  [1949]  assumes  observation  of 

4 

an  ergodlc  process  with  spectral  density  1/(1  'i  ) .  We  have  x(t)  > 


?l(T)' 


/§i(t)\  ,  with  A  -  (1,0).  Here  the  stable  factor  of 


?2(0  \?'(T) 


1+ttj’ 


Is 


1  +  /Tp  +  p 


Y  •  Setting  ?j(t)  -  ti(t)  - 


V,  we  have  ?j(t)  + 


1  +  /2p  +  p 

/2  ?'(t)  +  ;''(t)  «•  V.  Setting  |'(t)  -  gives  ?2^t)  -  ?j'(t)  - 

-  ?j(t)  -  /2  ?2(''’)  -  1  *  V,  so  that  B  -  /  0  1  \  ,  Q  -  /O  OU 

U  -  \0  1/ 


This  Is  simple  prediction  since  n(T)  •  ^j^(t)  and  x(t)  - 

(q(T)) 

In  writing  the  solution,  to  shorten  notation  we  write  k  for  ft(T  -t-  a,T),  S 
for  S(t  a,T),  and  let  S  •  ^2\  Th^  prediction  equations 


are 


11 

12' 

12 

"22 

^1  ^2 

Si  -  -  -  /2  e, 

S'  -  /2o, 


12 


°22  °11  ■  °12 


22  -  °n  -  Ol2 


1  -  20j2  -  2  022, 
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with  initial  conditions  - 

the  differential  equations  are: 


S(t,t)  >  0.  The  solutions  of 


+  a,T)  -  (cos  a//2  +  sin  a//2)  ti(t) 

+  /2(sin  a//2) 

L(t  +  a.T)  -  -  /2(8in  a/y^)  ^(t) 

A 

+  (cos  a//2  -  sin  a//Z)  e  ri'(T). 


o^j(t  +  a,T) 
0i2(t  +  a,T) 

022^  a.T) 


1 1  -  (2  -  cos  /5  a  +  sin  a)  e~'^  °j 
y(l  -  COS  /Z  a)  e"*^ 

-  (2  -  COS  /2a-sin/2a)  e"'^ 


Example  IV. 2. c.  From  Wiener  [1949]  we  have  the  above  example  except 

with  spectral  density  1/(1  +  uj*)  .  x(t),  A(t),  Q(t)  are  as  before,  but  now 

B  -  /o  l\  .  Routine  substitution  gives,  using  short  notation  as  before. 


A 

^2 

-  • 

_  f  -  2$ 

-  ?1  -  ^^2 

S' 

- 

1  20j2 

°22  -  "ll  -  2°i2\ 

°22  -  '’ll 

-  *°12 

1  -  2^12  -  -*"22 

with  initial  conditions  x(t,t)  -  /  ti(t)^ 

the  differential  equations  are; 

,  S(t,t) 

••  0.  The  solutions  of 
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?ji(t  +  a,T)  -  (1  +  a)  e"'^  ti(t)  +  ae"®^  . 

?2(t  +  a,T)  -  -  ae"®  ti(t)  +  (1  -  a)  ti'(t)  , 

Oji(t  +  a,T)  -  |l  -  (1  +  2a  +  2a^)  |  , 

+  a.T)  -  (a*  e'*“)/2  , 

0^2 (T  +  a.T)  -  |l  -  (1  -  2a  +  2a^)  . 

Example  IV . 2 ■ d .  Wiener  [1949]  conaldera  an  ergodic  proceaa  with  apec- 

<■(1)2 

tral  denalty  e  ,  theoretically  completely  predictable,  and  approximatea 
it  by  a  rational  spectral  denalty  for  prediction  purpoaea.  Thla  is  a  limit¬ 
ing  case  of  simple  prediction,  with  all  derivatives  existing,  and  the  com¬ 
plete  state  vector  not  of  finite  dimensions.  The  B  matrix  is  infinite,  of 


To  predict  to  any  desired  degree  of  precision,  one  can  use  the 
observation  and  its  derivatives  to  an  appropriate  order,  and  extrapolate  by 
a  finite  Taylor  series. 

Example  IV. 2. e.  An  example  from  Wiener  [1949],  involving  non-scalar 

observations,  assumes  that  the  two  components  of  observed  stationary  proc- 

2 

esses  have  spectral  densities  -  *22^'"^  *  1/(1  " 

e/(l  -  Jw)*,  ♦2i('*’)  "  ®/(l  +  J-’)*,  with  0  <  c  <  1,  J  -  /n.  It  is  desired 
to  predict  each  component. 

To  represent  this  problem  in  our  terms,  we  note  that  the  speci¬ 
fied  tjjCuj)  and  ♦22^'“^  would  be  obtained  with  -  (1  +  Vj  and  tij  “ 

-1  /  2~  -1 

e(l  +  Juj)  Vj  +  ✓!  -  €  (1  +  J:;)  Vg*  with  Vj  orthogonal  to  V21  the  forms 
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being  suggested  by  the  e  These  trial  values  yield 

2  “1 

pectatlon  of  times  complex  conjugate  of  1^2)  as  e[l  -  (Juu)  ]  ;  the  de¬ 
sired  value  of  obtained  if  the  first  term  of  the  trial  is 

multiplied  by  (1  -  Jm)  (1  +  Juj)~^,  and  this  does  not  affect  ^2-  Thus  we 
may  write 

^2  “  e(l  -  Jiu)  (1  +  J'l))"^  Vj  +  '/l  -  (1  +  Jm)”^  Vg  . 

“  (1  +  JJ')"^  Vj  -  (1  +  Jtii)  (1  +  Juu)"^  Vj  . 

From  Inspection  of  these  relations,  and  results  of  earlier  examples,  we 
have 

^1  -  ?i  +  ?2 

TI2  “  “  ?2^  *  ^3’ 


B  - 

1 0 

1 

0  ^ 

.  Q  - 

h 

0 

0 ' 

-1 

-2 

0 

0 

1 

0 

0 

^  1 

0 

■/ 

A  - 

f" 

1, 

i  e. 

-e, 

yr 

For  predicting  the  observation  process  y(T),  this  is  a  case  of 
simple  prediction.  Using  the  results  of  examples  (IV. 1. a)  and  (IV. 2. c) 

t1j(t  +  a,T)  =  ?j(t  +  a,T)  +  tgCf  +  a.T) 

[c,(T,T>  .e-»n,(.), 
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2(t  +  a,T)  =  \/l  -  ?g(T  +  a,T) 


+  c  I (t  +  a,T)  -  ?2(t  +  a 


-a 


■’] 


o  ^ 


+  e  |(1  +  2a)  -  2a) 


“  I  ’^2^'^^  ■*■  2aeTij^(T)| 


.]} 


-2a, 


The  variance  of  estimate  of  -t-  a,T)  Is  (1  -  e  )/2,  from 

example  IV. 1. a.  That  of  a)  Is 


J  -  e‘*‘^  |eti^(t)  +  4a^  EnJCT)  +  4ae  EngC^) 

The  covariance  of  estimate  of  rij(T  +  a,T)  and  rigCT  +  a,T)  Is 
e"^®  E|rij(T)  '  2aeTij(T^|  -  eae”^®  . 


IV. 3.  The  "White  Noise"  Limiting  Case 


A  fictional  limiting  case  of  some  Interest  is  that  In  which, 
superposed  on  the  linear  function  of  x(t),  the  observation  contains  so- 
called  "white  noise,"  that  Is  random  Impulse  functions,  or  the  fictitious 
derivative  of  a  process  of  orthogonal  increments.  This  limiting  case  can 
easily  be  derived  from  Theorem  2,  which  states  that  for  any  discrete  obser¬ 
vation  y^  - 


A 

X. 


a^a 


A 

-  X 


a,a-l 


_=<*a  -  ^a,a-l>  ^^a  “  Va,a-l3 


*  A  V 

Ax  . ) 

a  n,a-l 


Aa*a,a-1 


) 


If  we  deal  with  y^  -  A^x^  +  v^^^,  where  v  Is  orthogonal  to  x^^^  and  v^_j,  ..  , 
etc.,  the  relation  remains  valid,  since  v^  Is  zero.  Mow  let  the  time 
Interval  between  observations  be  5,  with  Bv^v^  -  ’  •'o***^‘'B'*l**‘  ob¬ 

taining  for  the  Increment  A  x^ ; 
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6 


-  „  ,X„  +  N  /6) 

GLfCl^l  CL  ClOlyQL^X  CL  OL 


-1 


(y. 


-  A  ^  ,  )  . 

OL  GL  y  QL  *  X 


Considering  a  sequence  of  decreasing  6,  tending  to  zero,  one  has'^  with  t 
the  time  of  the  observation 

-  11m  (A 

6-0 


-  llm(S„  „  ,  A„(5A„S„  „  ,'X  +  N  )"^  (y„  -  A„^„  „  , )) 

^  I  Cl  y  OL^X  Cl  Cl  0LyOL*X  CL  Cl  QL  CL  OLyOL^X  J 

-  S(t,t)  A(t)  N“^(t)  l^y(T)  -  A(t)  i(T,T^  , 


provided  that  A(t )  and  N(t )  are  continuous .  By  a  similar  calculation  It  Is 
easily  seen  that 


S(t,t)  -  -  S(t,t)  A(t)  N"^(t)  A(t)  S(t,t). 

Combining  these  results  with  those  of  Theorem  3,  we  have  the  general  solu¬ 
tion  of  this  limiting  case  In  terms  of  differential  equations.  It  Is  neces¬ 
sary  only  that  the  matrix  functions  of  time,  A(t),  B(t),  Q(t),  and  N(t),  be 
piecewise  continuous.  There  are  numerous  cases  In  practice  where  the  sim¬ 
plicity  of  this  limiting  case  outweighs  Its  defects,  Just  as  there  are 
numerous  cases  In  which  simplicity  of  the  limiting  case  In  which  one  con¬ 
siders  his  processes  stationary  and  A(t)  constant  outweighs  resulting  defl- 
c lenc les . 

Example  IV.3 .a .  [Wiener,  1949.''  One  observes  the  sum  of  a  stationary 

process  (the  "signal")  with  spectral  density  1/(1  uu^),  and  "white  noise" 

2 

of  spectral  density  e  .  As  earlier,  the  signal  I  has  A-1,  B--1,  Q-1, 

2 

while  the  noise  term  has  N  -  e  .  Thus, 

f(T,T)  -  S(t,t)  AN"^  |n(T)  -  A?(t,t)J 
-  S(t,t)  €"^rTl(T)  -  ?(t,T^ 


^Recall  that  F(t,o)  ^  dF(T,o)/d". 
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S(t,t)  -  -  S(t,t)  Xn“^  A8(t,t)  -  -  S^(t,t)/€^  . 
From  Theorem  3  we  have 

?'(T,T) 

S'(t  ,t) 

During  observation,  we  have 


+  S'(t,t)  -  1  -  2S(t,t)  -  S^(T,T)/e^  . 

from  time  0  to  time  t,  (possibly  with 
t(0,0)  -  0,  8(0,0)  -  1/2,  from  the  nature  of  the  process).  The  differential 
equation  for  8(t,t)  has  the  solution,  with 

V  4  \/l  +  €^/e 

S(T,T)  -  1  ^  S(O.O)  ^  y  1  -  1 

^  ^  Y  -1  l(Y*-l)  8(0,0)  -  Y  +  1 

with  1/(y  +  1)  =  €^1  +  -  e)  the  asymptotic  value  of  8(t,t)  as  t  in¬ 

creases  without  limit. 

As  in  example  IV. 1. a,  prediction  from  time  t  to  time  (t  a)  is 

given  by: 

t(T  +  a,T)  -  ?(t,t)  e"*^ 

S(t  +  a,T)  -  J  +  [s(t,t)  -  1/^  e"*'^  . 

Example  IV . 3 . b .  [Wiener,  1949.]  One  observes  the  sum  of  a  stationary 

signal  with  spectral  density  1/(1  +  a^)  and  white  noise  of  spectral  density 
e^.  It  is  desired  to  estimate  the  present  value  of  the  derivative  of  the 
signal.  As  in  an  earlier  example,  we  have 


and 

S(T,T) 

8uppose  that  one  observes 


-  -  ?(t,t) 

-  1  -  28(t,t)  . 
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x(t)  -  /5i(0\  ,  A  -  (1,0),  B  -  /  0  1  \ 

1-1  -^1 

We  consider  observation  from  time  0  to  time  t. .  If  our  only  information  at 

X  A  A 

time  0  is  the  nature  of  the  stationary  process,  we  have  ^^^(0,0)  " 

0,  8(0,0)  -  1/^/4  0  \  . 

\  0  yi/4] 


We  shall  write  S(t 


tion  interval. 


+ 


a,T)  -  /ojj  aj2\ 

Vl2  °22/ 

4(t,t)  -  S(t,t)  X(t)  M 


(t  +  a,T) 

■^(T)  C(T) 
C(T), 


During  the  observe - 


Both  during  and  after  the  observation  interval  ^'(t  +  a,T)  ••  Bft(T  +  a,T), 
so  that 
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?j(t  +  a,T) 
?2(t  +  a.T) 


A  ^ 

A  A 

-  ?j(t  +  a,T)  -  /2  ?2(t  +  a,T)  . 


Also  S'(t  +  a,T)  -  BS(t  +  a,T)  +  S(t  +  a.t)  ff  +  Q  - 


/ 


2o 


12 


°22"°11‘  '^^°12 


\ 


\  °22"°11"  '^^°12  1-20^2-2  >^2522 

\  /  (t  +  a,T)  . 


Equating  to  xero  dS(T,T)/dT  -  §(t,t)  +  8'(t,t),  one  obtains  the 
asymptotic  solution  as  t  Increases  without  limit: 

®11(“,»)  “  *  ■  ®)  ■  >^2  e"*  0 

(<>  • 

a  3/4  2  a  1/2 

°22(-,«)  ■  ®  1  ®  ®  ^ 

•»  4  4  4  2 

+  2/2  e**(l  +  c*)  -  /2  e*  <-  /2  B(e^  +  8  +1), 

-1  4  1/“* 

with  8  ^  e  (1  +  e  )  -  1.  Substituting  the  asymptotic  S(»,®)  Into 

d^^/dT  and  d;2/<l'''>  one  obtains  the  asymptotically  optimum  frequency  opera- 
tora  on  ti;  with  p  -  a’, 


_  8 

(8+1)* 


*  P  -  /2  8 
+  /2  (8+1)  P  + 


ti(t  ) 


the 

the 


i  ~  /2  8  P  +  8(8+2) 

5iVT,T}  -  -  H - 

^  (8+1)*  +  /2  (8+1)  p 

A 

asymptotically  optimum  operator  for  $2 
reference . 


- V  ti(t), 

+  P 

(t,t)  agreeing  with  the  result  In 


Example  IV. 3. c .  [Shlnbrot,  1956.]  A  particle  leaves  the  origin  at 

time  zero,  with  constant  velocity  of  mean  zero  and  variance  8.  One  ob- 
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serves  from  time  0  to  time  t  the  sum  of  particle  position  and  white  noise 
of  spectral  density  y. 


have 


l^t  the  particle  position  be  its  velocity  be 


x(0,0)  -  I  o\ 

,0 


8(0,0)  - 


c  r 


B  -  /O  1] 

,0  0, 


Q  -  0,  A  -  (1,0). 


With  a  2  0,  S(t  +  a,T)  4  (^n  °i2 


1^12  °22 


(t  +  a,T) 


During  observation 


x(t,t)  -  B(t,t)  Xn"^  C(t)  -  C(f)/Y  , 


12 


with  C(t)  -  r|(T)  -  ?j(t,t): 


S(t,t)  - 


11 


o,,0 


'’ll°12 


iri2  ^^12 


/  (t,t)> 


Always  ii'(T  +  a.t)  -  Bft(T  +  a.i)  -  /  +  ci,t)\; 


S'(t  +  a,T)  -  /2o 


12  "^22 


'22 


(t  +  a,T) 
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f 


t 

% 

\ 


! 

i 


Solving  dS(T,T)/dT  -  0  with  S(0,0)  “  /O  0 

,0  0 


S(t,t) - T 

3y  +  Bt 


Predicting  from  time  t  to  time  t  a  we  have 

A  A 


A  A  A 

?j^(t  +  a,T)  -  +  a  > 


A  A 

?2(t  +  a,T)  -  ?2(t,t)  , 

and  from  S'(t  a.r),  or  directly  from  the  above  pair, 


S(t  +  o,t)  -  — ^ — -■ 
3y  +  St"* 


f(T  +  a)^ 

I  T  +  a 


T  +  a  I 

1  , 


Example  IV. 3. d.  [Shinbrot,  1956.1  Over  a  finite  time  interval  begin¬ 

ning  at  time  zero,  one  observes  the  sum  of  ;(t)  and  white  noise  of  spectral 
density  Y-  ?(0)  is  zero;  ((t)  is  constant  over  the  interval  (0,t^)  except 
for  one  Jump  of  mean  zero  and  variance  0  t^,  the  time  of  the  Jump  uniformly 
distributed  between  0  and  for  t  >  t^,  ;(t)  -  ?(Tj).  The  best  linear 

estimate  of  ^(t)  is  desired. 


Here  A  -  1,  ?(0,0) 

-  8(0,0)  -  0.  We  have  E!(t) 

-  0, 

E?^(t) 

j 

pSTj  '  (t/Tj)  -  0T 

(t  S 

b^l 

('  ■ 

"l> 

Thus  B  -  0,Q  -  p 

» 

(f  <  fj)  . 

lo 

(T  >  Tj) 

A 

?(t,t)  -  S(t,t)  C(t)/y  . 


S(t,t)  -  -  S*(t,t)/y  . 
t'(T  +  a,T)  -  B?(t  +  a,T)  -  0  . 
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(t  +  a  <  Tj^) 


S'(t  +  a,T)  -  p 

Ip 

For  T  s  Tj,  -  8  -  S^(t,t)/y  , 


(t  +  a  >  Tj)  . 


so  that 


S(t,t)  -  tanh  [(B/y)^^^  ^  ,  (t  s  t^) 

For  T  i  Tj^,  -  -  S*(t,t)/y,  giving 


A 

?(T), 


S(t  ,1) 


Y  8(Tj,Tj) 

Y  +  (T-Tj)  SlTj.Tj) 


(t  a  Tj)  . 


For  prediction  from  time  t  to  time  (t  a)  «e  have  ^  (t  -fa)  - 


S  (t  +  a,T)  -  S(t,t)  +  8  cp, 

with  "P  -  min  (t^  -  T,a),  if  t  <  t^,  and  cp  zero  otherwise. 
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In  this  section  we  consider  observations  continuous  over  an  in¬ 
terval,  in  fact  with  A(t)  possessing  a  derivative  A'(t)  ^  dA(T)/dT,  The 
processing  of  data  in  such  a  case  cannot  be  by  numerical  methods  and  it  typi¬ 
cally  uses  electronic  equipment.  A  good  example  is  the  processing  of  data 
considered  by  Wiener(1949X  in  which  processing  is  specified  in  terms  of  a 
constant  coefficient  frequency  operator  on  continuous  inputs.  In  general, 
the  best  estimate  on  the  basis  of  continuous  observations  will  require  con¬ 
tinuous  data  processing,  but  can  often  be  approximated  well  enough  by 
arithmetic  methods  in  which  the  observations  are  lumped  into  approximate 
discrete  observations  or  even  simply  sampled.  Once  this  approximation  has 
been  made,  the  methods  of  Section  III  can  be  employed.  From  a  mathematical 
standpoint,  it  is  not  advisable  to  ignore  the  case  of  continuous  observa¬ 
tions,  as  the  study  may  shed  light  on  theoretical  points  as  well  as  practi¬ 
cal  points  Involving  techniques  and  accuracy  of  approximation.  From  the 
standpoint  of  applications,  the  methods  of  processing  continuous  data  are 
of  great  interest,  since  in  many  cases  continuous  processing  may  be  cheaper, 
faster,  and/or  more  precise  than  digital  processing,  despite  the  tremendous 
progress  in  digital  computers. 

In  Section  IV  a  fictional  limiting  case  involving  continuous 
observations  with  superposed  white  noise  was  considered  and  solved,  but 
our  primary  interest  is  in  realistic  situations  involving  observations 
which  are  finite  with  probability  one.  We  must  now  require  that  continu¬ 
ous  A(t)  be  differentiable.  The  simplest  case  is  considered  first,  in 
which  the  matrix  A(t)  Q(t)  X(t)  is  nonsingular.  This  condition  is  satis¬ 
fied  if  each  scalar  observation  includes  a  simple  Markov  component,  unless 
there  is  redundancy  in  the  observations.  In  general,  it  is  presumed  that 
trivial  singularities  resulting  from  redundancy  in  the  matrix  A(t)  are  re¬ 
moved  by  routine  methods.  The  next  case  considered  is  that  in  which  the 
observation  function  is  scalar,  A(t)  consisting  of  a  single  row  vector 
a(T),  but  with  a(T)  Q(t)  a^^')  possibly  singular. 

V. 1.  A(t)  Q(t)  X(t)  Monslngular. 

With  continuous  A(t),  it  is  natural  to  attempt  to  apply  Theo¬ 
rems  2  and  3,  with  the  time  between  observation  and  y^  shrinking  to 
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zero.  The  difficulty  that  arises  Is  that  Is  zero,  as  can  be  seen 

from  (2.1.6),  so  that  the  estimate  of  the  error  of  estimate  of  ,  given 
by  (2.1.5)  la  a  meaningless  expression  involving  the  product  of  a  zero  matrix 
times  the  Inverse  of  a  zero  matrix.  We  must  therefore  obtain  the  limit  of 
(2.1.5)  as  the  time  interval  6  between  the  observations  y(T)  and  y(T  +  6) 
shrinks  to  zero.  This  limit  exists  provided  a'(t)  and  [A(t)  Q(t)  A(t)]"^  exist 
at  T,  as  shown  In  the  following  theorem.  (In  Theorem  4,  y(T)  is  a  column  vector 
with  V  components,  x(t)  and  m(t)  column  vectors  with  m  components.  A(t)  Is 
V  X  m;  B(t),  Q(t),  and  S(t,t)  are  m  x  m.) 


Theorem  4.  Given  the  hypotheses  of  Theorems  2  and  3,  with  A(t)  a  specified 
differentiable  function,  A(t)  q(t)  2i(T)  nonsingular.  Let  (!i(T) 
represent  the  Increment  In  ]^(t,t)  due  to  the  observation  y(T)  •• 
A(t)  x(t),  5(^,1)  the  time  derivative  of  S(t,t)  due  to  contri- 


(4.1.1) 


(4.1.3) 

(4.1.4) 
Proof. 


buttons  u(t),  and  z(t)  -  y(T)  -  A(t)  x(t,t).  Then 


0(i 


)  -  |s('i,T)  X(t^  S(t,t)  x(t), 

(4.1.2)  S(t,t)  -  -  |s(‘r,T)  X(t^  S(T,T)j 


with 


jjs(T,T)  X(T)j  -  S(t,t)  X'(t)  +  S(t,t)  ff(T)  X(t) 

+  Q(t)  X(t)  , 

[a(t)  S(t,t)  X(t)]  '  -  A(t)  Q(t)  X(t)  . 

From  (2.1.6)  of  Theorem  2,  S(t,t)  A(t)  Is  zero.  From  (3.1.2) 


of  Theorem  3, 


(+) 


(4.2.1) 


S'(t,t)  -  B(t)  S(t,t)  +  S(t,t)  B(t)  +  Q(t)  . 


■^Recall  that  F'(T,a)  dF(T,o)/dT. 
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Since  S(t,t)  X(t)  -  0  -  A(t)  S(t,t),  (4.2.1)  yields 
j^S(T,T)  X(t^  -  8(t,t)  X'(t)  +  B(t)  •  0 
*  S(t,t)B(t)X(t)  .  +  Q(t)  X(t), 

which  is  (4.1.3).  Similarly,  (4.1.4)  Is  obtained  from  (4.2.1) 
and  (4.1.3)  by 

jA(T)  8(t,t)  X(t)J'  -  A'(t).0  +  A(t)[S(t,t)X(t)]' 

-  A(t)s'(t,t)X(t) 

-  A(t)Q(t)X(t)  . 


To  determine  (1(t), 


we  have  from  Theorem  2 


(4.2.2) 


^(t  +  5,  T  +  ft)  -  ii(T  +  ft,T)  -  S(t  +  ft,T)  X(t  +  ft)* 


•  jA(T  +  ft)  S(t  +  ft,T)  X(t  +  ft)j 
,  |y(T  +  ft)  -  A(t  +  ft)  4(t  +  ft,T)^ 

Now  u(t)  Is  the  limit  as  ft  ~  0  of  (4.2.2).  From  Theorem  3  and 
the* differentiability  of  A(t)  we  have  that  y(T  +  ft)  -  A(t  +  ft)* 
^(t  +  ft,T)  2  A(t  +  ft)  [x(t  +  ft)  -  x(t  +  ft,T)]  is  continuous  at 
ft  -  0  if  w(t  ft,T)  is  continuous  at  ft  -  0.  (3.0.3)  Indicates 

that  the  mean  square  of  w(t  ■¥  f>,i)  is  continuous  and  tends  to 
zero  as  ft  tends  to  zero,  so  that  w(t  .*■  ft,T)  is  continuous  with 
probability  one.  Thus  the  final  factor  on  the  right  of  (4.2.2) 
tends  to  the  limit  z(t)  as  ft  approaches  zero.  The  limit  of 
the  other  factors  on  the  right  of  (4.2.2)  Is  as  given  In  (4.1.1), 
by  the  Mean  Value  Theorem,  In  view  of  (4.1.3),  (4.1.4),  and 
the  hypotheses  on  the  matrices  involved.  In  the  same  manner, 
the  limit  of  [S(t  +6,t  +ft)  -S(t  +  ft,T)Vft  Is  seen  to  be 
given  by  (4.1.2). 


Bxample  V . 1 . a .  [Shinbrot  1956.1  A  particle  leaves  the  origin  at  time 

zero,  with  constant  velocity  of  mean  zero,  variance  S.  One  observes,  from 


53 


Continuous  Observations 


time  zero  to  some  finite  time,  the  sum  of  particle  position  and  a  dlstur- 

K-ToI 

bance  ?3(t)  of  mean  zero  and  E  ^(’’’2)  "Ye  ,  with  Y  >  0, 

cp  >  0.  (A  limiting  case,  with  C3('’')  replaced  by  white  noise,  was  con¬ 
sidered  in  example  IV. 3. c.) 

The  product  moment  function  for  with  shows  that 

the  constant  variance  of  Is  y.  Comparing  the  product  moment  function 

with  e  E  §3(Tj^),  obtained  from  (3.0.2)  on  multiplication  by 

and  taking  expectations,  shows  that  the  element  of  B  corresponding  to 
Is  -  cp.  Finally,  the  eleawnt  of  Q  corresponding  to  Is  2y’'P,  by  S'  > 

BS  -f  Sif  -f  Q,  since  the  variance  of  Is  constant. 

Here  we  have  5^(1)  -  position,  "  velocity,  ?3(t)  -  dis¬ 

turbance  . 


B  - 

1 

,  Q  - 

h 

0 

0 ' 

0 

0 

0 

1 

0 

0 

0 

1“ 

0 

-V 

0 

2yc,t 

A  -  (1,  0,  1). 


Following  the  Initial  observation  at  time  0,  when  A(t)  Jumped  from  0  to 
(1,  0,  1),  during  the  observation  interval  A(t)  is  constant  and  A(t)  Q(t) 
Y(t)  -  2yfi  >  0,  so  that  the  hypotheses  of  Theorem  4  are  satisfied.  The 
initial  observation  must  be  used  as  shown  in  Section  III.  Assuming  no  in¬ 
formation  other  than  that  of  the  problem  statement,  we  have  ft(0,0-)  - 
0,  8(0,0-)  -  /  0  0  0  \  . 

jo  B  0 

\o  0  .j 


At  the  initial  observation,  (1(0) 


y"^  Ti(0)  -/  0  \ 

r 

\  ^  i 

\r|(0) 

\  / 

\ 

8(0,0-) 


*(0)  |a(0)  S(0,0-) 


Tl(O)  - 
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l.e.,  the  Initial  observation  specifies  the  value  of  §3(0).  From  this  or, 
If  preferred,  more  formally,  one  has  S(0,0)  -  /o  0  o  \  .  Thus  the 


0  e 

0  0 


continuous  observation  has  Initial  values  3^(0, 0)  - 


0 
0 

,  S(0,0)  - 


/ 


0  0 

0  e 

0  0 


/  0 
0 

^Tl(0)y 


o\  .  since  S(t,t)  X(t)  -  0  during  the  observation  Interval, 
0 
0 


we  write  S(t,t)  -la 


'll 

‘^12 

-°11^ 

12 

°22 

-°12 

11 

■'^12 

^11  1 

(t,t) 

;t)  U 

T  ,T)  , 

>)  “ 

|s(t  ,T  ) 

S  ( 

^  XCAQX) 

1 


012  +  Wjl 


°22  *  "^’^12 


2y.>  -  vOjj  - 


C(T) 


(t  ,t) 

^  "'’ll  ^^^^22'"^°  12^ 

(Oj2+wOjj)(2yh)-Oj2-^Ojj 

(°22''^''12^^ 

t°22'^  12^ (2yi-0j2- 'Oil 

• 

(2ycp-0j2-Wii)^ 

(t,t) 
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Both  during  and  after  the  observation  interval, 


iJ'(T  +  a,T)  -  B^(t  +  a,T)  -  /  +  a,T) 


-£P  ?3(t  +  a,T); 


+  a,T)  -  BS(t  +  a,T)  +  S(t 

+  a,T ) 

“/  2° 12  °22 

•paii-o 

°22  0 

^”12 

\cpoii-ai2 

2cp(y-o 

(t,t) 

Solving  dS(T,T)/dT  =  S(t,t)  *  S'(t,t),  with  S(0,0)  as  specified,  one  ob¬ 
tains 


S(t,t)  - 


6y-'  +  3Bt  +  3veT‘  +  cp“  Bt''  I  ,2  ,  2 

\-T  -T  T 


2  2 

T  T  -T 

T  1  -T 


The  problem  of  sequential  estimation  has  been  solved.  Possible 
confusion  may  be  avoided  by  calculating  also  the  prediction  from  time  t  to 
time  T  •».  a,  and  then  taking  into  account  an  observation  at  time  t  a, 
assuming  the  continuous  observation  ceased  at  time  t.  The  predicted  vector 
is  from  U' (t  +  a, t ) , 


tj(T  +  a,T)  -  +  a 

?2(t  +  a,T)  - 

?3(t  +  a,T)  -  ?3(t,t)  •  e“ . 


During  the  prediction  interval,  we  do  not  have  S(t  +  a,T)  constrained  by 
A(t): 
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8'(t  +  a,T)  -  B8(t  +  a,T)  +  8(t  +  a.t)  B  +  Q 
20i2  O22  °23"'‘’°13 

0  -CPO23 

'  •  •  2cp(y  -  O33)  ' 

Writing  c(t)  ■  6yv6/<6y(P  +  30t  +  3cp0t*  +  cp*  3t®),  the  solution  of  the  dif¬ 
ferential  equation,  with  initial  condition  8(t,t),  is 


8(t  +  a,T) 


/  (t  +  a)*  £ (t) 


\ 


(t  +  a)  e (t) 
e(T) 


-T  (t  +  a)  e“^°'  e  (t  )  \ 
£  (t) 

Cy-t*  e(T)]/ 


For  an  observation  at  time  (t  •••  a), 

(i(T  +  a)  -  8(t  +  a,T)  A<t  +  a)  [A(t  +  a)  8(t  +  a,T)  X(t)1"^  CC^  +  a). 

8(t  +  a,T  +  a)  -  8<t  +  a.t) 

-  -  8(t  +  ttfT)  X(t  +  a)  [A(t  +  a)  8(t  +  a,T)  X(t  +  A(t  +  a)  8(t  +  a,T)  . 

Example  V.l.b.  Wiener,  1949,  considers  observations  over  [-  •,t,]  on 

— 2  4  ^ 

a  process  with  spectral  density  t(ui)  -  (1  -f  ii>  )/(l  +  -Ji  ) ,  and  wishes  to  pre¬ 
dict  the  value  of  the  process  at  time  (t^  a).  This  is  similar  to  Example 

IV. 2. b,  except  that  here  A  •  (1,  1)  or  perhaps  (1,  -1),  so  that  this  is  not 
a  case  of  simple  prediction.  We  shall  consider  separately  Case  I,  in  which 
A  •  (1,  1),  and  Case  II,  in  which  A  -  (1,  -1);  in  each  case  we  assume  ob¬ 
servation  over  an  interval  beginning  at  time  sero,  with  initial  estimates 
zero  and  variances  of  estimate  the  variances  of  the  processes;  namely 
8(0,  0-)  -  1/2/A  0 

\  0  /2/4 
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The  initial  observation  of  ri(0)  gives  estimates  j^(0,0)  - 


8(0,0-)  X(AS(0,0)-X]"^  Ti(0)  and  8(0,0)  - 
8(0,0-)  -  8(0, 0-)?  [A8(0,0-)Xl"^  A8(0,0-),  or 


Case  I : 

ii(0,0)  - 

Tl(O); 

8(0,0)  - 

1  /2/8 

-  /2/8' 

/2/8 

/2/8^ 

Case  II: 

Uo.o)  - 

Ti(0);  • 

8(0,0)  - 

1/2/8 

/2/8' 

[-^1 

1/2/8 

/2/8^ 

During  the  observation  interval,  S(t,t)  X  -  0  implies  that 


S(t  ,t)  -  /  0  -o\  in  Case  I,  la 


in  Case  II. 


Recalling  that  B 


(0  ^\>  Q  "  (  0  0\,we  compute 

-1-/2  1  0  1 


A  ft(T)  -  C8(t,t)  S  +  q1  X(AQX)’^  £(t),  S'(t,t)  -  8(t,t)  B  +  B8(t,t) 

+  Q,  8  -  [S(t,t)  &  +  Ql  X(AQX)"^  A[B8(t,t)  +  Ql ,  and 

ft'(T,T)  -  B^(t,t),  obtaining  ft'(T,T)-/0  1 

\~1  -  /2 

both  cases. 

In  Case  I , 


j  x(t,t)  in 


A  li(T,T)  -  /  -  (2  -  /2)o\ 

C(T) 

\1  +  (2  -  /2)o/ 

'  *r 
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S(t,t)  -  -  |(2  -  /2)^ 

s' (t ,T )  -  j  -  2a  /2a 

I  /2o  1  +  2(1 

The  solution  Is  defined  In  terms  of  o(t),  which  from  dS/dT  > 
s'  +  S  satisfies  do/dr  -  -  2a  -  (2  -  /2)^  with  Initial  value  /2/8.  The 
solution  for  o(t)  is 

o(t)  -  [(3  +  2/2)  (e^^  -  1)  +  4  /S]"^  , 
tending  to  zero  as  t  increases. 

In  Case  II, 

A  ii(T  ,t)  - 

S(t,t)  - 


S'(T,T)  - 

In  Case  II  do/dt  -  2-  -  (2  +  /2)^  o*,  which  with  o(0)  -  /2/8 

yields 

r(T)  -  [(3  -  2  /2)(1  -  e"^’')  +  4  /2r^  , 


f(2  +  /2)o 


^(2  +  /2)o  -  11 


(2  +  .  2)*  0^ 


C(T>. 


(2  +  /  2)^  0^  -  (2  +  /2)o\ 


[(2  +  /2)  0  -  1]' 


2o  -  /2o 

/2a  1  -  v^2o(l  +  /2) 


-  (2  -  /2)  0  -  (2  -  /2)*  0^ 
[1  +  (2  -  /2)  o]^ 


-  /2)  0 


tending  to  3  -  2  /2  as  t  Increases. 
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The  extrapolation  to  time  -t-  a  Is  In  either  case  given  by 
-  Bjl(T,T^)  ~  /  0  \  yielding  estimates  of  the  same 


1-1  -  /2j 


form  as  In  Example  IV. 2. b,  with  r|(T)  and  ti'(t)  replaced  by  snd 

>''^1 )  respectively.  Letting  o..  represent  the  covariance  of  estimate 

^  A*  *  A  *J 

of  and  s j ,  we  have  the  same  differential  equation  for  **’  a,T)  as 

In  Example  IV. 2. b,  but  different  initial  conditions  since  S(t,t)  Is  non¬ 
zero.  Thus  S(t^  +  a,T^)  is  as  in  IV. 2. b,  together  with  the  Increments  due 
to  o(Tj)  : 


Case  I 

SOuCti  +  a.Tj^) 
<  60i2(ti  +  a.Tj) 


o(Tj)  e“  '^*“{(2  -  /2)  +  (1  -  /2)(8in  /2a  -  cos  /2a)}~ 

oCtj)  e“  '^*'^{(1  -  /2)  +  /2(1  -  /2)  cos  /2a} 

o(Tj)  o'*  '^^“{{2  -  /2)  -  (1  -  /2)(sin  /2a  +  cos  /2a)} 


or  Case  II 

“oii(Ti  +  a.Tj)  -  a(Tj)  e‘  '^*'^{(2  +  /2)  +  (1  +  /2)(8in  /2a  -  cos  /2a)} 

<  '^’■^1^  “  °^^1^  ®'  ^2)  cos  /2a  -  (1  +  /2)} 

-?°22^^1  *  ®”  *^*“{(2  +  /2)  -  (1  +  /2)(sin  /2a  +  cos  /2a)} 

The  non-equality  of  SCt.u)  in  Cases  I  and  11  Is  natural >  al¬ 

though  somewhat  obscured  in  treatments  merely  estimating  ri(T^  +  a).  The 
variance  of  estimate  of  oi)  !■  course  the  same  in  Case  II  as  In 

Case  I,  namely,  {1  -  e~  [(2  -  /2)  -  (1  -  /2)  cos  /2al}  +  e" 

2(3  -  2  /2)  (1  -  cos  /2a)/[(3  +  2  /2)  (e*''  -  1)  +  4  /2}. 


For  T  large,  o(t)  will  be  approximately  equal  to  its  asymptotic 
value  of  zero  In  Case  I ,  or  3  -  2  /2  in  Case  II.  The  corresponding  asymp¬ 
totic  values  of  A  are: 
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Case  I,  ti  it  ~  /o\ 

W  ‘ 

Case  II,  -  12-/2] 

1-/J  ' 

Combining  these  with  •  Bft,  one  obtains  the  asymptotically  optimum  fre¬ 
quency  operators  on  ti(t)  for  and 

Case  I:  -  <1  +  r|^  -  Jid(1  +  r|  ; 

Case  II:  -  <1  +  C (2  -  /2)  Jjj  +  (/2  -  1)] 

^2  -  -  <1  +  [(/2  -  1)  Juu  +  (2  -  /2)1  Ti  . 


A  Note  on  Instrumentation .  One  aspect  of  the  solution  for  con¬ 
tinuous  observation  Is  maintaining  at  zero  the  difference  between  y(T)  and 
A(t)  ^(t),  by  absorbing  variations  in  y(T)  immediately  into  Thus  we 

call  for  the  Instantaneous  increment  in  ^(t)  to  be  a  multiple,  say 

C(tX  of  z(t)  •  y(T)  -  a<t)  ft(T,T).  In  practice,  a  difficulty  may  arise  in 
a  system  operating  on  z(t),  because  the  change  in  fi(T)  immediately  affects 
z(t),  so  that  infinite  amplification  of  z(t)  appears  called  for.  The 
simplest  reaction  is  to  be  content  with  an  approximate  solution,  multiply¬ 
ing  z(t)  by  a  large  number,  say  one  million,  to  render  the  inaccuracy  neg¬ 
ligible.  The  estimation  equations  can  be  mathematically  transformed  in 
many  cases,  particularly  if  the  function  A(t)  is  known  for  all  time,  to  a 
form  eliminating  this  difficulty;  it  can  also  be  avoided  by  assuming  the 
existence  of  a  convenient  limiting  case,  such  as  the  "white  noise"  case  or 
the  stationary  case  considered  by  Wiener;  finally,  it  can  be  avoided  by 
sampling  the  observable  function.  One  should  take  care,  however,  that 
errors  arising  from  incorrect  mathematical  formulation  or  types  of  instru¬ 
mentation  requiring  great  precision  do  not  far  outweigh  the  inaccuracies 
due  to  finite  gain  in  a  system  operating  on  z(t). 


V.2.  Scalar  Observat ions . 

We  shall  now  consider  scalar  observations,  with  the  hypotheses 
of  Theorem  4  not  necessarily  satisfied.  Since  y(T)  has  only  one  component, 
we  shall  now  write  and  a(t)  for  y(T),  z(t),  and  A(t);  and  singu- 
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larity  of  a  matrix  of  the  form  a(T)  Q(t)  a(T)  is  equivalent  to  a  zero  magni¬ 
tude.  We  assume  as  In  Theorem  4  that  a(T)  Is  piecewise  absolutely  continu¬ 
ous,  i.e.,  that  aCr)  Is  differentiable  except  at  a  finite  number  of  points, 
at  which  salt!  or  Isolated  values  occur  (we  are  considering  observation 
over  a  finite  time). 

We  have  previously  considered  the  case  In  which  a(T)  S(t,t-) 
a(T)  Is  non-zero;  and  that  In  which  this  form  Is  zero,  but  a(T)  is  differ¬ 
entiable  and  a(T)  Q(t)  a(T)  Is  non-zero.  This  leaves  the  case  in  which 
a(T)  Is  differentiable  but  both  quadratic  forms  are  zero.  It  is  reasonable 
In  this  case  to  consider  differentiating  possibly  obtaining  a  non¬ 

zero  form  corresponding  to  a(T)  q(t)  a(T);  the  derivative  of  ((t)  will  in 
general  include  components  of  x(T)‘more  directly  related  to  the  disturbances 
represented  by  Q(t).  The  following  theorem'*'  indicates  that,  under  mild 
restrictions,  such  differentiation  Is  possible  and  is  a  nonsingular  trans¬ 
formation  of  the  function  ti(t).  We  use  the  operator  *,  such  that 

*F(t)  4  f'(t)  +  F(t)  B(t). 

(In  Theorem  5,  a(T),  x(t),  ^^(t.t),  and  w(o,t)  are  column  vectors  of  y  compon¬ 
ents;  S(t,t),  Q(t),  J(o,t),  and  K(o,p)  are  u  x  u.) 


Theorem  5.  Given  the  hypotheses  of  Theorem  3,  with  S(t,t)  a(T),  Q(t)  a('r), 


and  [Q(t)  a(T)'|'  zero;  *a  and  *  a  existing;  then,  with  proba¬ 
bility  one, 

(5.1.1)  C'<T)  -  (•?)  Cx(t)  -  ft(T,T)]  -  Ti'(t)  -  ft(T,T) 

exists,  and  is  a  nonsingular  transformation  of  CC''^)* 

Proof .  Let  a  >  t,  with  hypotheses  satisfied  at  time  t.  From  (3.0.3), 

o 

(5.2.1)  J(o,T)  4  Ew(o,t)  w(r,T)  K(o,p)  Q(p )  K(c^,P)  dp. 

T 

Using  (3.0.1), 


Here  and  later  we  make  use  of  the  fact  that  AXX  is  zero  if  and  only  if 
AX  is  zero,  for  X  symmetric  and  non-negative  definite. 
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(5.2.2) 


(5.2.3) 


(5.2.4) 


(5.2.5) 


J'(o,t)  -  B(a)  J(ci,t)  +  J(o,t)  B(o)  +  Q(o). 

Defining 

y(o.t)  4  a(o)  J(o,t)  a(o), 

and  differentiating  y(°>t)  with  respect  to  o,  using  (5.2.2) 
and  the  *  operator, 

y'(o,t)  -  a'(o)  J(o,t)  +  a(o)  [Q(o)  +  B(o)  J(o,t) 
J(o,t)  B(o)]  a(o)  +  a(o)  J(a,T)  a'(o) 


“  [•a(o)]  J(o,t)  a(c')  +  a(o)  J(a,T)  [*?(oTT 
+  a(o)  Q(o)  a(o). 


Similarly, 


By  (5.2.1)  J(o,t)  is  continuous  and  zero  at  o  •  t.  Together 
with  the  hypotheses  of  this  theorem,  this  makes  every  term  of 
Y  (o I T ) ,  Y  '  (c . T ) ,  and  Y  "  (o , t )  zero  at  o  -  t . 
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By  a  familiar  theorem,  it  follows  that 


(5.2.6) 


(5.2.7) 


(5.2.8) 


(5.2.9) 


rt'(T) 


j\(T)  x(t^  -  a'(T)  x(t)  +  a(T)  B(t)  x(t) 
^a(T^  x(t), 


with  probability  one.  From  (3.1.1), 


a'(T)  ii(T,T)  +  a(T)  B(t)  ^(t,t) 
|•a(T^  ft(T,T). 


From  (5.2.8)  and  (5.2.9),  (5.1.1)  Is  Immediate.  The  transformation  of  Hr)  into 
C  (t)  la  nonsingular,  since  Z(,r)  Is  zero  by  the  hypothesis  that  S(t,t)  a(T)  -  0. 

In  the  hypotheses  of  Theorem  5,  some  mild  restrictions  have  been  placed 
on  the  functions  a(T),  B(t)  and  Q(t).  We  recall  that  the  hypotheses  of  Theorem 
3  required  B(t)  and  Q(t)  to  be  piecewise  continuous,  while  those  of  Theorem  4 
Implied  A(t)  piecewise  absolutely  continuous.  The  hypotheses  of  Theorem  5 
require  that  B(t)  and  any  existing  derivative  of  a(T)  be  piecewise  absolutely 
continuous.  Repeated  application  of  Theorem  5  will  require  that  higher  ‘existing 
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derivatives  of  aCr)  and  B(t)  be  piecewise  absolutely  continuous. 

The  second  restriction  imposed  by  the  hypotheses  of  Theorem  5  is  that 
when  S(t,t)  a(T)  and  Q(t)  a(T)  are  zero,  we  must  have  [Q(t)  a(T)]'  -  0,  so  that 
isolated  zeros  of  Q(t)  a(T)  are  inadmissable.  This  restriction  does  not  appear 
to  be  onerous  in  practice,  so  that  Theorem  5  has  been  simplified  by  assuming 

such  a  restriction.  If  it  is  not  satisfied,  the  proof  indicates  that  the  re¬ 

sult  is  merely  to  change  the  expected  square  of  the  derivative  of  a(a)  w(a,T) 
at  0  •  T  from  zero  to  [a(T)  Q(t)  a(T)]'/2,  thus  adding  a  term  both  to  snd 

C^(t).  The  practical  consequence  is  that  the  observation  should  be  ig¬ 

nored  for  any  t  at  which  8(r,T)  a(T)  and  Q(t)  a(T)  are  zero,  but  [a(T)  Q(t) 
a(T)]  is  not  zero. 

When  Theorem  9  is  applicable,  one  simply  replaces  nCT)  by  n'(T),  and 
a(T)  by  *a(T),  then  applies  Theorem  2  or  Theorem  4  if  possible,  otherwise  one 
uses  Theorem  5  again. 

The  writer  has  found  in  the  literature  no  Illustrative 
examples  requiring  the  use  of  Theorem  5.  To  show  the  method,  we  apply  it  in 

Example  IV. 2. b.,  a  case  of  simple  prediction.  Here  x(t)  */^^(‘'')\  ,/  B  -  /  0  1  V 

\?2(t)1  '  U  -  /J 


Q 


and  A(T)  -  (1.0)  for  'r  2  0,  assuming  continuous  observation  beginning  at  time 
zero.  Assuming  only  the  naturelof  the  process,  we  have  ft(0,0-)  -  0,  8(0,0-)  • 


/2/4  0 

,  0  /2/4 


'll 

’l2 


'12 

^22) 


(0,0-).  The  initial  observation  at  time 

zero  yields  (1(0)  -  8(0,0-)a(0)  [a(0)S(0,0-)a(0)  1**^  C(0)  - 

-  //2/4\  (/2/4)"^  (ri(0)  -  O]  -  /ri(0)\ 

\  O  /  \  0  /,  and 


8(0,0)  -  8(0,0-)  -  U»/*\  (/2/4)"^  (/2/4,0)  -  /  O  O 

\  0  /  I  O  /2/4 
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-  / 

After  the  initial  observation  S(T,T)a(T),  a  (t),  Q(T)a(T),  and  [Q(T)a(''')]  are 
all  zero,  so  Theorem  S  can  be  applied. 

0 

•1, 

expected  value  of  q'  (O)  is 

E  [s'  (0)]2  -  ♦a(0)  [8(0,0) 1  [*S7o)T  - 


*a(T)  -  a  (T)  ♦  a(T)B(T)  -  o  (1,0) 


1  \  -  (0,  1) .  The  square  of  the 

-  /2  / 


-  (0,1)  /  0  o' 

1  O  /2/4 


1 0  j  -  /2/4  y  O. 


The  Initial  observation  of  11('<')  therefore  gives 


Ci(O^)  -  8(0,0)  (*a(0)]{I*a(0)l  8(0,0)  [*a(0)l}"^  C'  (0^) 

-  /  o\  (/2/4)"^  (Tl'(0)  -  0)  -  /  0  \ 

\/2/4/  iTMO)/  ; 

8(0^  0^)  -  8(0,0)  -jo 

\/2/4 


j  (/2/4)"^  (0,/2/4)  -  0. 


Following  the  Initial  observation  of  ’T(f),  S(t,t)  f*a^Ty]  la  zero, 
but  <J<T)  [*aTi^]  “/o  o  \  /®\  “  /®\  ^  ***•*  Theorem  4  gives 

\o  1  J  \V  VI 

{[(T)  -  Q(T)  f*?(T)'l  |7*a(T)]  Q(T)  T*«»TTTi|  C’(T)  - 

■  (:)  (:  :)  CB" 


0 

C*(T) 
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Since  S(t,t)  -  o,  S*(t,t)  -  Q 


(0  o\  The  above  expressions 
0  1/  . 


for  u(t)  indicate  that  ^(t)  >  x(t).  After  the  initial  observation  of  n(0)  and 
ri '  (0^)  ,  one  has 


S(T,T) 


|oj  (1)"^  (0,1) 


I  O  o\  showing  formally  that 

U  1/ 


has  zero  variance  of  estimate. 


V.3.  The  General  Problem.  All  the  results  needed  to  solve  the  estimation 
problem  are  now  at  hand.  Stuuaarizing  results  of  V.2,  when  the  observation  is  a 
scalar  function  with  a(T),  B(t)  and  Q(t)  possessing  required  piecewise  con¬ 
tinuity  (absolute  in  some  cases),  at  each  t  we  consider  the  sequence 

S’(‘f)S(T,T_),J(T)Q(T),  j*9:(T^  S(T,T_),  |*a(T^  Q(T),  j**a(T)J  S(t,T_),  ... 

the  sequence  terminating  at  the  first  non-zero  term,  called  the  critical  term, 
which  is  of  the  form  *^^^(t)H(t),  One  replaces  ti(t)  and  a(T)  by  the  y***  deriva¬ 
tive  of  T|(T)  and  '»^a(T),  respectively,  thus  replacing  C(t)  by  C^^^(t)  -  r|^Y^(T) 

-  (*^a)x(T , T.) .  If  the  matrix  H(t)  is  S(t,t.)  one  applies  Theorem  2,  obtain¬ 
ing 


If  the  matrix  H(t)  in  the  critical  term  is  Q(t)  one  applies 
Theorem  4,  obtaining 
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If  the  sequence  has  no  critical  term,  the  observation  should  be  Ignored.  In  all 
cases,  from  Theorem  3,  one  extrapolates  continuously  by 


+  a,T)  -  B(T  ♦  a)^{T  +  a,T) 

8'(t  +  CL,r)  -  B(T+a)8(T+a,T)  ♦  8(T+a,T)  B(T+a)  ♦  Q(T+a) 
with  a  >  o  when  one  Is  sequentially  calculating 


In  considering  observations  which  are  non-scalar  functions  of  time, 
we  must  first  transform  each  scalar  observation  as  discussed  above.  In  order  to 
avoid  singularity  of  matrices  to  be  Inverted.  To  simplify  notation,  let  us  now 
use  the  symbols  ilaCf),  and  for  the  transformed  values  of  the  a*** 

observation,  coefficient  row  vector,  and  difference  between  observation  and  pre¬ 
dicted  observation  respectively.  8calar  •observations  with  no  critical  term 
should  be  Ignored. 


as 

Let  constitute  the  matrix  formed  of  row  vectors  a^('r)  corres¬ 

ponding  to  critical  terms  in  8(t,t_);  and  s^(t)  the  vector  of  corresponding 
Cj<T).  If  the  matrix  A®(t)8(t,t_)  X®(t)  is  nonsingular.  Theorem  2  can  be  applied, 
giving 


fl®(T)  -  8(t,t_)  35:*{t) 


[A®(T)g(T,T_j)X®(T^ 


-1 


.8 


(T), 


8<t,t)  -  8<T,T_)  -  8(t,t_)X®(t)  |a®(t)8(t,t_j)X®(t^’^  A®(t)8(t,  t  J  . 

81mllarly,  let  A^^Ct)  and  s^(t)  represent  the  matrix  of  coefficients 
and  vector  of  scalar  observations  for  which  the  critical  terms  contain  Q(t),  and 
apply  Theorem  4  If  A^(T)q(T)X^(T)  is  nonsingular: 


o 

with  -8(t,t)  of  similar  form,  z^(t)  being  replaced  by  the  transpose  of 
the  first  term. 
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Thus,  In  sunsary  wo  vary  by  ft'(T,T),  fl®(T),  and  {!1^(t);  and  vary  S(t,t) 

S'(T,T),  S(t,t)  and  the  incresental  tern  involving  A®(t)S(t, .  There 
remains  a  question  regarding  possible  singularity  of  the  matrices  to  be  in¬ 
verted.  The  only  reasonable  possibility  of  singularity  appears  to  be  observe 
tions  which  are  linearly  dependent,  in  which  case  an  elementary  algebraic 
transformation  should  be  used  to  remove  the  singularity. 


by 
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In  previous  sections,  it  has  not  been  sssuned  that  the  process 
{x(t))1s  Gaussian,  as  such  an  assumption  Is  of  no  significance  regarding  linear 
estimation  with  minimum  variance  of  estimate.  In  this  section  we  consider  an 
Important  aspect  of  Gaussian  x(t),  namely  that  x(t  +  a,T)  and  S(t  +  a,T)  define 
completely  the  distribution  of  x(t  4-  a)  conditioned  by  the  observation.  This 
Implies  that  a)  the  estimate  of  any  function  of  x(t  +  a),  which  mlnlmlmes  the 
expectation  of  any  loss  function.  Is  specified  as  a  function  of  ilCT  *  a,T)  and 
S(t  *  a,T);  and  b)  cl.t)  is  the  mlnlmax  estimate,  with  respect  to  mean 

square  error  of  estimate,  over  any  class  of  possible  distributions  of  {x(t)} 
which  Includes  Gaussian  (x(t)). 

Asymptotically  best  estimation  In  ergodlc  Gaussian  {x(t)},  treated 
by  Wiener,  1949,  will  be  considered  briefly. 

VI. 1  Sufficiency  of  Best  Linear  Estimates 

It  Is  well  known  (cf.  Doob,  1953,  pp.  561-562)  that  the  estimate 
minimizing  variance  of  estimate  Is  linear  when  [x(t))  is  Gaussian  and  observa¬ 
tions  are  linear  In  xC^),  that  Is,  that  4-  a,T)  im  the  expectation,  condi¬ 
tioned  by  the  observations,  of  x(t  a) ,  It  follows  that  *  a,T)  is  a  suffi¬ 
cient  statistic  for  x(t  a),  and  that  x(t  4-  a,T)  and  8(t  *  a,T)  together 
specify  the  conditional  distribution  of  x(t  4-  a). 

If  we  wish  to  estimate  any  function  of  xCt  4-  a),  say  f(x^  4-  a^* 
f*(x.^  ^  so  as  to  minimize  the  expectation  of  a  specified  loss  function,  say 
()p|f(x.^  ^  q),  f*(x.^  4-  only  minimize  the  expectation  of  <p  over  the 

conditional  distribution  of  x.^  ^  which  Is  defined  by  4-  a,T)  and 
8(T  4'  a,T).  Consequently  if  (x(t)}  is  Gaussian  there  Is  no  need  to  restrict 
ourselves  to  minimizing  the  variance  of  estimate  of  linear  functions  of  x(t  4-  a) ; 
once  ft(T  4-  a,T)  la  determined  we  can  minimize  the  expectation  of  any  loss  func¬ 
tion  of  any  function  of  x(t  4-  a).  The  further  problem  is  merely  that  of  mini¬ 
mizing  the  Integral: 

-v/2  -1/2 

Uf*)  4  (2n)  8 


f*(x),  f(x)le 


>]■ 


-1/2  (X  -  Sk)8"^(x  -  k) 


dx. 
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In  thla  expreaalon  we  aaaume  the  time  argument  fixed,  and  also  aaaume  that  S  la 
nonalngular,  poaalbly  aa  a  reault  of  reduction  of  the  number  of  components  of 
the  vector  x.  The  minimization  of  the  Indicated  X(f*)  la  not  typically  diffi¬ 
cult,  although  analytic  methods  will  not  always  suffice.  In  many  cases  the 
problem  Is  simplified  by  nulling  the  derivative  of  X(f*)  with  respect  to  f*. 

If  f(x)  Is  a  linear  transformation  of  x,  and  cp  Is  a  symmetric  non-negative 
function  monotone  non-decreasing  with 
X(f*). 


f*-  f|  Increasing,  then  x  minimizes 


Example  VI. 1. a.  Suppose  f(x)  -  x  and 


Cp(x  ,x)  -  (x  -  x)  Cj  (x  -  x)e 


dx. 


with  and  C2  symmetric  and  non-negative  definite.  Here  c  may  be  a  vector 
chosen  on  the  basis  that  for  x  near  c,  small  errors  of  estimate  are  specially 
desirable.  This  Is  an  example  of  minimization  of  a  weighted  mean  square,  a 
topic  which  has  been  given  some  attention  recently.  The  specified  loss  function 
can  represent  quite  a  variety  of  situations,  Including  the  basic  one  considered 
In  this  paper  (C^  ~  0).  Letting  Y  represent  any  positive  scalar  not  depending 
on  X  or  X  ,  we  have 


X(x*) 


/<»*■ 


x)Cj(x*- 


x)e 


-1/2 


[u 


c)C2(x  - 


c)  1-  (x^ft)S”^{x  - 


dx. 


The  quadratic  form  In  the  exponent  Is 

-  (C2+S”^)”\c2C+S"^x^  (C2C'*E"^)  [x  -  (€2+8"^  ^CgC+S^^ii^  . 

Thus  X(x*)  Is  proportional  to  the  expectation  of  (x*  -  x)  C,  (x*  -  x)  over  x 

normally  distributed  with  mean  (C2  ♦  8"^)  (CjC  +  8"^  ft),  so  that  this  mean  Is 
the  minimizing  x  .  Note  that  x*  does  not  depend  on  C^,  and  Is  a  linear  combina¬ 
tion  of  ft  and  c,  replacing  ft  by  an  x*  biased  In  the  direction  of  c,  proportion¬ 
ally  to  c-ft.  8uch  a  bias  is  to  be  expected  on  the  basis  of  the  prescribed  loss 
function. 


VI. 2  A  Mlnlmax  Property  of  Best  Linear  Estimates 

We  observed  above  that  the  best  linear  estimate  Is  best  among  all 
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estimates,  if  [x(t)}  is  Gaussian.  On  the  other  hand,  best  nonlinear  estimates 
are  superior  to  best  linear  estimates  for  all  but  a  few  distributions  of  (x(t)}; 
Example  III.4.b  Is  a  case  in  which  the  best  linear  estimate  has  asympotic 
efficiency  of  zero.  In  most  cases  the  complete  form  of  distribution  of  lx(T)i 
is  now  known,  although  it  is  frequently  assumed  to  be  approximately  Gaussian. 

For  {x(T)i  satisfying  the  relevant  hypotheses  of  our  theorems,  it  can  be  seen 
that  the  best  linear  estimate  is  minimax  over  any  class  of  {x(t) )  distributions 
including  the  Gaussian.  This  result  follows  at  once  from  the  facts  that  the 
variance  of  estimate  of  a  linear  estimate  does  not  depend  on  the  distribution 
of  x(t)  except  for  the  first  two  moments;  and  that  the  best  linear  estimate 
uniquely  minimizes  the  variance  of  estimate  for  Gaussian  [x(t)}. 

VI. 3  Ergodic  Gaussian  Processes. 

The  problem  of  best  linear  estimation  in  ergodic  Gaussian  processes 
was  considered  by  Wiener,  1949,  who  derived  for  ergodic  Gaussian  observation 
processes,'*'  the  time-invariant  operator  asymptotically  minimizing  the  variance 
of  estimate  of  the  observation,  or  of  a  desired  component  of  the  observation, 
called  the  signal.  The  examples  considered  by  Wiener  are  discussed  in  Sections 
III,  IV,  and  V  of  this  paper.  In  one  sense,  the  problem  treated  by  Wiener  is  a 
special  limiting  case  of  the  general  problem  considered  in  this  paper,  with 
B(t)  and  Q(t)  constant  and  restricted  so  that  {x(t)}  be  ergodic,  A(t)  constant 
from  indefinitely  in  the  past  to  the  time  of  latest  observation. 

In  another  sense,  Wiener's  problems  are  more  general  than  those  of 
this  paper,  as  he  considered  all  stationary  Gaussian  processes  possessing  spec¬ 
tral  density  functions,  whereas  our  stationary  Gaussian  processes  cannot  have 
spectral  density  functions  which  are  not  rational.  This  distinction  does  not 
appear  to  be  extremely  great,  however,  in  view  of  the  fact  that  every  spectral 
density  function  can  be  approximated  arbitrarily  closely  by  a  rational  spectral 
density  function.  In  Wiener's  one  example  with  a  non-rational  spectral  density 
function,  namely  e~^  ,  he  approximated  by  a  rational  function  in  order  to  pre¬ 
dict  approximately. 

In  estimating  parameters  of  a  distribution  function,  the  statisti¬ 
cian  often  uses  inefficient  estimates  Justified  by  asymptotic  efficiency,  wbsn 
the  asymptotically  efficient  estimate  is  easy  to  calculate  and  not  too  far  from 
optimum.  Similarly,  one  should  often  use  Wiener's  methods  to  estimate  in  random 


*  The  results  are  valid  also  for  best  linear  estimation  with  non-Gaussian  and 
non-ergodlc  stationary  processes. 
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processes  when  the  simplicity  of  the  method  outweighs  Inaccuracies  which  In 
many  cases  are  extremely  minute.  The  reader  can  find  excellent  expositions  of 
Wiener's  method  by  Lee,  1960,  and  Darlington,  1958.  A  simple  application  of 
Theorem  2  to  the  Initial  observation  will  sometimes  reduce  the  error  associated 
with  estimation  using  Wiener's  methods. 

With  vector  observation  functions,  there  may  be  considerable  diffi¬ 
culty  In  determining  the  asymptotically  optimum  linear  operators.  The  tech¬ 
niques  of  this  paper  supply  a  method  of  determination,  since  the  problem  Is  re¬ 
duced  to  that  of  finding  the  asymptotic  value  of  the  S(t,t)  matrix,  with 
dS(T,T)/dT  -  s'(t,t)  +  S(t,t).  This  is  a  matrix  Rlccatl  equation,  studied  by 
Reid,  1946,  and  Levin,  1959.  Writing  the  matrix  Rlccatl  equation  as 

d  r  (T)/dT  -  GgCT)  +  G^(T)  r  (T)  -  r  -  r  (t)G3(t)  r  (t)  , 

we  have 


r(T)  -  r  (O)  •*.  MgCx)]  [iigd)  r  (o)  ♦  M^(T)r^  , 

where  dM(T)/dT  ••  G(t)M(t);  more  explicitly. 


/  ,  . 

\ 

/ 

o 

to 

Ilj(x) 

dll(x) 

’  dx 

f  0l<^> 

“4<^>y 

with  M(0)  ~  I,  provided  the  Inverse  and  Integral  Indicated  exist  throughout  the 
Interval  (0, x).  The  proof  given  by  Levin  [1959]  Is  as  follows:  substitute 
r  (T)  into  d  r  (T)/dx.  With  r  (X)  representing  8( G^C’)  "  ~  and 

G2('r)  and  G^Cx)  are  symmetric.  In  the  Wiener  case,  the  G  matrix  is  constant, 
and  we  seek  the  asymptotic  value  of  r  (x),  with  T  (0)  the  S(0,0)  determined 
from  Integration  of  spectral  densities.  The  differential  equation  for  the  M 
matrix,  dM(x)/d'  -  GM(x)  is  a  constant  coefficient  linear  differential  equation, 
with  well  known  methods  of  solution.  Thus  the  asymptotic  value  of  S(x,x)  can 
be  obtained  routinely.  We  note  that  the  differential  equation  for 


Is  the  same  as  that  for 


but  that  Initial  values  are 


and 
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/o\ 

,  respectively.  Note  also  that  S(T,r)  is  typically  singular,  and  that 

\^/ 

computation  will  then  be  simplified  by  reduction  to  a  smaller  matrix;  several 
examples  of  this  paper  are  Illustrative. 

The  Wiener  methods  can  be  extended,  at  least  with  scalar  observa¬ 
tions,  to  derive  mlnlmax  estimation  procedures,  when  the  spectral  density  of 
the  signal,  the  remainder  of  the  observation  (called  noise) ,  or  both  are  not 
completely  specified  but  subjected  to  linear  restrictions  [Carlton  and  Follln, 
1956].  The  mlnlmax  estimation  procedures  minimize  the  maximum  of  the  variance 
of  estimate  over  all  spectral  density  functions  satisfying  the  linear  con¬ 
straints.  The  constraints  may  be  Inequalities  or  equalities,  of  the  form 
• 

^  with  9^  a  prescribed  symmetric  non-negative  function, 

^OB 

2 

either  unity  or  an  arbitrary  number  >  1.  6(a>)  proportional  to  1,  eu  ,  or 

uj4  represent  bounded  mean  square  magnitude,  velocity,  or  acceleration  of  the 
process  considered.  The  mlnlmax  linear  operator  Is  the  Wiener  operator  based 
on  the  maxlmln  spectral  density.  The  maxlmln  spectral  density  Is  such  that 
the  square  of  the  absolute  value  of  the  resulting  mlnlmax  frequency  operator 
Is  a  constant  linear  combination  of  the  at  all  frequencies  with  nonzero 

spectral  density,  and  equal  to  or  less  than  this  constant  linear  combination 
at  frequencies  with  zero  spectral  density. 
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BEST  ABSOliUTELY  UNBIASED  ESTIMATION  IN  WIDE  SENSE  MARKOV  PROCESSES 


Consider  discrete  linear  observations  on  a  process  which  is  vector 
Markov  in  the  wide  sense,  as  in  the  hypotheses  of  Theorem  2,  with 
existing  and  denoted  by  a*  Sepeated  application  of 


a  a,a-l  a-l  a 


*a  “  ^a.n+i  ^*a+i  ''a+i 


(■ 


yields  x^  -  \ 


8-2 


*a,e,u  ''8’ 


with  Kg^  p  y  p<a  <  9  <  u  ♦  1)  or  (u  <  8  <  a  ♦  1),  zero  otherwise. 


Writing  x^j  -  Ixf^^  ,  where  x^^^^  is  the  subvector  of  x..  for  which 


V  r(i 
'(2) 


the  best  absolutely  unbiased  linear  estimate  is  desired,  and  correspondingly 
writing 


*a  "  *d)  ♦  ^a,u,(2)  *(2)  2  “a.e.u  '"s  • 

8-2 
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Thus  the  observation  r|^  Is 


“  *a*a  “  *a^a,u,  (i)*(i)  +  “a^a.u,  (2)*(2)  *  2  • 

0-2 


To  obtain  an  absolutely  unbiased  estimate,  we  must  have  the  "error  of 
observation,"  "  “a  y  (l)*(l)'  every  The 


hypotheses  on  Vg  are  sufficient,  provided  a„L, 


every  x 


Ex 


ci“a,M,(2)*(2) 


has  zero  mean  for 


(!)■ 


If  x^2)  non-zero  mean,  one  should  replace  x,,^  by  x 


(2) 


'(2) 


^2)1  xnd  ^a  ~  (2)^’^X3)  ‘  requirement  Is  then  satisfied 

Ex.-v],  is  zero  for 


If  the  mean  of  the  adjusted  term,  Sa^a  y  (2)  t*(2) 
every  x^^j 

mutually  Independent.  The  vector  y  can  now  be  written  as 


(2) 

a  condition  which  surely  Is  satisfied  If  x^^^  and  x^2) 


y  4 


l\\ 


\  I 


-  Mx^jj  +  °1*(2)  ^ 


1 

0-2 


®B''0’ 


.th 


with  the  a  row  of  M.  of  G^,  and  of  Gg(2  <  B  <  v):  a^L^  (j,;  a^La^y 
and  aaKa^0^^>  respectively.  Assuming  that  the  mean  of  G^x^2)  zero  for 
every  x^^^,  and  that  mS  and  R  ^  E(y  -  Mx  (1)^  (y'^^d))  are  nonsingular. 
Theorem  0  can  be  applied  to  give 


‘(1) 


-  (M  R  H)  M  R  ^  y 


with 


V 

V 

R  -  ®  (lx*<2)  ♦ 

2  S'^b) 

K*(2)  ♦  2 

0-2 

0-2 

with  0  -  (Gj,  Gj, . . . 

)■ 


GVG 


and 
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V  - 


/  ®  *(2)*(2) 


E  V  V 


using  the  orthogonality  of  x^2)>  ^2*  ‘‘  ’  ''v  ‘ 


The  computation  to  obtain  is  extensive,  particularly  if  one 

wants  to  estimate  such  a  subvector  for  various  values  of  <x.  it  has  been 
exhibited  to  indicate  the  possibility  of  absolutely  unbiased  estimation  in 
cases  not  generally  considered  parametric.  There  is  a  widespread  tendency 
to  seek  best  absolutely  unbiased  estimates  of  parameters,  but  best  esti¬ 
mates  of  non-trivlal  random  variables.  The  logical  basis  for  such  a 
tendency  is  not  clear,  as  it  would  seem  more  difficult  to  obtain  a  priori 
information  for  complicated  unknowns  than  for  simple  unknowns. 
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WIDE  SENSE  VECTOR  MARKOV  PROCESSES 


Letting  I  represent  the  best  linear  estimate  of  random  variables  in 
terms  of  conditioning  variables,  the  wide  sense  vector  Markov  processes 
are  those  with  the  property: 

I  [X(T^)  I  X(T^_^),  X(T^_2),  ..  ,  X(T^)]  =  fe  [x(T^)  I  x(T^_jj], 

whenever  a  ^v-1  *  ^  ^1*  Considering  first  a  fixed  sequence 

^2*  ••  *  ’^v+l*  •••  '  <l®note  x(Tj^),  xCt^)  ..  by  x^,  Xg,  ..  . 

property  (Bl)  is  equivalent  to 

*v  =  ^,v-l  *v-l  ^  ' 

with  v^  orthogonal  to  x^,  Xg,  ..  ,  x^_j  and  to  Vj,  Vg,  ..  ,  It  is 

evident  that  (B2)  implies  (Bl).  To  show  that  (Bl)  implies  (B2),  we  write 
(Bl)  as 


(B-1) 


Then 


(B-2) 


^  ^V— 1^  *  *  ^  ^1^  ^  V— 1  ^v— 1 

implying  that  x^  =  \_i  v^,  with  v^  orthogonal  to  x^_j.  Similarly, 

*v-l  =  ^-v-l,  v-2  *v-2  ^  ''v-l»  \-l  orthogonal  to  x^_g.  Since  v^  is 

orthogonal  to  a  linear  combination  of  the  orthogonal  variables  x^_g 

and  v^_j,  it  follows  that  v^  is  orthogonal  to  x^_g  and  v^_j.  Repeating 
this  argument,  it  is  seen  that  v^  is  orthogonal  to  ..  ,  v^,  and  to 

x^-i  ■■  >  establishing  (B2). 
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To  extend  the  property  (B2)  to  all  sequences  ..  ,  we  write 

(B2)  as 


xCTg)  =  KCTg,  T^)x(Tj^)  +  w(T2,t^), 


(^2  ^ 


(B-3) 


with  wCTg.T^)  orthogonal  to  w(T^,Tg)  for  ^4  ^  a  t^,  and  orthogonal 

to  x(T)  for  T  <  T  Repeated  application  of  (B3)  shows  that  for  t  s  7  , 

3  2  1  ^ 

K(T3,t^)  =  K(T3,T2)K(T2,Tj), 


and  it  is  clear  from  (B3)  that  K(f,T)  =  I,  w(t,t)  =  0.  The  nonsingular 
solution  of  the  functional  equation  (B4),  with  K(^,t)  =  satisfies  the 
relation 


dKCTg,!^)  =  B(t2)K(t2,t^), 


(B-5) 


with  B((7)  any  function  bounded  over  the  Interval  of  interest.  (In  Theorem 
3,  it  is  required  that  B(0)  be  bounded  and  piecewise  continuous,  a  condi¬ 
tion  which  seems  necessary  for  practical  applications. ) 

In  order  that  the  process  (x(t)}  possess  finite  variance, 

must  have  finite  variance.  The  variance  of  w(’'^2'^l^'  continuous,  can  be 
written  as 


Ew(t2,t^)w(T3,Tj)  = 


K(T2,a)Q(o)K{T2,a)do, 


(B-6) 


with  Q(cr)  a  non-negative  definite  function  bounded  over  the  Interval  of 
interest.  (In  Theorem  3,  considering  practical  applications,  it  is  re¬ 
quired  that  Q(o)  be  bounded  and  piecewise  continuous.) 

From  (B6),  it  is  seen  that  w'('t,t)  does  not  exist  unless  w((7,t)  is 
zero  (with  probability  one)  for  0  in  the  neighborhood  to  the  right  of  t ; 
for  if  it  did  exist,  we  would  have 
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I 


Q(T)  =  e'w(t,t)w(t,t)  =  Ew'(t,t)w(t,t)  +  Ew(t,t)w'(t,t), 


with  Q(T)  non-zero  and  w(t,t)  zero.  Applying  this  result  to  the  individual 
components  of  w(a,T),  it  is  seen  that  no  component  of  x(t)  can  include  the 
derivative  of  a  component  for  which  the  corresponding  component  of  w(a,T) 
is  not  zero  in  the  neighborhood  of  t,  i.e,  for  which  the  corresponding  rows 
and  columns  of  Q(t)  are  not  zero. 

The  processes  specified  by  (B3),  (B5),  (B6)  clearly  have  continuous 
first  and  second  moments,  and  thus  are  continuous  in  quadratic  mean  (Loeve, 
1955,  p.  470).  In  extending  (B2)  to  apply  simultaneously  to  all  ordered 
times,  we  have  imposed  two  restrictions:  first,  that. the  transition 
function  be  nonsingular;  second,  that  the  variance  EwCt^, t^)w(‘<'2, 

be  continuous.  In  the  vast  majority  of  cases,  these  restrictions  are  of  no 
significance.  If  EwCt^, is  not  continuous.  Theorem  2  can  be 
applied  at  any  point  of  discontinuity.  Singular  continuous  transition 
functions  typically  imply  trivial  components  of  the  x-vector,  which  pre¬ 
sent  no  problem. 

As  examples  of  wide  sense  vector  Markov  processes,  we  use  in  the  body 
of  the  paper  all  the  examples  considered  in  the  papers  of  Wiener,  1949  and 
Shinbrot,  1956.  As  another  example,  we  mention  the  Gaussian  vector  pro¬ 
cesses  (xCr)}  with  first  and  second  moments  satisfying  our  restrictions. 
Letting  be  the  interval  of  interest,  the  Gaussian  processes  satisfy¬ 

ing  (B3),  (B5),  and  (B6)  are  those  with  Ex(t)  =  K(t , t^)Ex(t^) , 

and  for  t  <  o, 

Ex(t)x(ct)  =  K(t,t_^)  [Ex(t^)x(t^)]  lf(a,T^) 

K(t,p)<J(p)K(t,p)c!c  •  KCa.T)  . 
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